As many of you are aware, a few months ago I changed jobs (more like positions) and with that change has come a shift in the focus of much of my work. I’m spending less time in the abstract (i.e. “how could we solve problem X" in the cloud”) and working more in the concrete (“we have agency Y that has problem X that needs to be deployed in the cloud yesterday… what do you think?”). One unfortunate (at least in my way of thinking) is that many of these projects have legislative requirements that prohibit the use of public cloud platforms (at least currently). While I’m always looking for ways to avoid building individual, private “Cirrus1” clouds, and I remain hopeful that the day will come when we all can leave the infrastructure build-outs to someone else, the reality is that in today’s world there is some data that simply cannot live outside of an organization’s boundaries – leaving us to look at “private cloud” approaches.
We have been experimenting with a few different approaches for some of the projects I’ve been working on and are learning a number of interesting things. I want to be quick to say that I don’t think we’ve “arrived” by any stretch of the imagination and our work (and this space in general) is changing/evolving rapidly. Our initial requirements included using free/OSS solutions wherever possible and to, as much as possible, avoid vendor lock-in (always great in theory, frequently falls apart in application). The hardware used for this exercise includes 45 physical nodes spread over two racks with a total of 1160 cores, 3.8 TB of RAM, 200 TB (RAW) of local node storage and 288 TB (RAW) of iSCSI-connected storage. Our plan is to provide 8 VLANS across the environment (Management, Dev, Test, Prod + corresponding disk-traffic networks for each). The physical nodes/nics would all default to the management network and all of the actual compute resources (VMs used by researchers) would live in one of the other three zones (Dev/Test/Prod). For reasons that I won’t get to in this post, we standardized on CentOS for the primary OS and Xen as the target hypervisor. While some of the uses of the platform include data serving (i.e. web sites, data services, etc.), most of the workloads will tend to be heavy data analytics. The above scenario is complicated by the fact that the entire environment needed to be air-gapped (i.e. not connected to the Internet or other networks in any way).
While we have a good bit of experience using Eucalyptus on the Ubuntu stack, our initial plan was to go with the current wisdom of the crowds and deploy OpenStack as our cloud orchestration layer. The intent was to buy in fully and deploy their storage, image, and compute services. Unfortunately, while there are some very interesting things happening in this realm, we didn’t find this toolset to be at the level we needed it to be given our platform selections. After a few weeks of trying to get this working, we bailed due to simply needing to get something working2. We are hopeful that this situation will evolve in the future and we will reconsider down the road, but for the present we had to scrap it and move on.
Having burned a bit of our grace period, we were faced with the need to get something running fast and spent a weekend digging through our options. Where we ended up is using the free XenServer (via Citrix Systems) as the host OS in combination with XenCenter for managing the nodes. This – while not a cloud or cloud orchestration layer – allowed us to quickly meet some of our sponsor’s needs while buying some time to fill in the gaps. Our team is currently evaluating CloudStack as the cloud/orchestration layer to sit on top of XenServer and be the researcher-facing interface to the platform. Our hope is that as the story evolves (Citrix, XenServer, XenCenter, CloudStack, OpenStack, etc) that the deployment of future platforms will become easier and the “best” approach will become clearer.
Notes
1. I spent a good bit of time looking on the web for the official name of the tiniest cloud but didn’t come up with anything better than “Cirrus” which is defined as a “thin, wispy cloud.” Not exactly what I was looking for, but I’ll use it for now.
2. Our issues included (among other things) Python version conflicts between what OpenStack needed and the version of Centos (5.6) we were running, the lack of a good Web UI/self-service portal, getting the VLANS talking properly between hosts, vhosts, storage, etc., and a number of smaller miscellaneous items.
I had the privilege of speaking at DevLink 2011 a few weeks ago in Downtown Chattanooga, TN. I have been a bit OBE (overcome by events) since I left the conference and have been unable to post my slides until now. I hope to get the videos and other materials up in the coming week or so. If you came to one of these sessions – thanks – the attendance at both was great and I appreciated the questions from the audience.
[updated 6/1/2011 with embedded video]
I have the opportunity to talk at StirTrek today and wanted to make the slides available from today’s session. I’ll update this post a bit more following the session.
I just finished reading a book from the Microsoft Patterns & Practices group called Moving Applications to the Cloud on the Microsoft Windows Azure Platform. I’ve had the book for a few months, and my when I first received it, I read the first chapter or two, decided it wasn’t worth the read, and set it aside.
Lately, however, I picked it up again – finished the book, and am glad I did. Don’t get be wrong, it didn’t magically morph into a superb spectacle of literary greatness, but I did find that as I read further, the authors moved further from the very basics of the Windows Azure platform and the content became increasingly interesting.
If you are new (or relatively so) to the Windows Azure platform and contemplating the moving of existing applications to the cloud, this is a worthwhile discussion of a fictitious scenario that did just that. The scenario is slightly on the cheesy side, but realistic enough to help you think through issues you may be facing in your business.
If you are well experienced with the platform, you will likely find this a bit dry – especially the first portions. You’ll also likely be distracted or bothered by the not-so-covert marketing that takes place. That said, the book covers some more complex topics such as multiple tasks/threads sharing the same physical worker role, various optimization topics, and more. In the end, I’m glad I read it and feel that I learned some things from the book.
My last thought has nothing to do specifically with the book, but rather a growing frustration of mine with the Windows Azure platform – the design of the table storage platform. Upon reading books such as this I’m reminded (they stress it *many* times) how important your partition key/row key strategy is, and how literally hosed you are if you get it wrong. This compares with my recent experiences with Amazon’s SimpleDB product, and the delta couldn’t be more striking. Both platforms solve essentially the same problem, but in the case of SDB, it is effortless (at least by comparison). I don’t have to think of partition keys, or be overly concerned with how the underlying storage platform works… I just put data in it. Additionally, *every* column is indexed and performs reasonably under queries. I can’t shake the feeling that the Azure team is missing it here – there has to be a way to get a well-designed, horizontally scaling table structure without placing such a design burden on the users.
I’m pleased to announce that the excellent utility – the Azure GAC Viewer – is once again online and available for general use. You can access it at http://gacviewer.cloudapp.net. This tool shows you a dynamically generated list of all of the assemblies present in the GAC for an Azure instance. Additionally, it also allows you to upload your project file (*.csproj or *.vbproj) to have the references scanned and let you know if there are any discrepancies between what you are using and what is available (by default) in Azure. You can then adjust your project file (copy-local=true) to ensure your application can run successfully.
If you are familiar with the tool, you may be thinking “Wait! you aren’t Wayne Berry, and besides, the URL has changed!” – and you would be correct on both counts. Wayne developed the tool and posted about it back in September of last year. Since that time, however, Wayne has accepted a position on the Windows Azure team and is unable to continue to maintaining the site full time. As a gesture of kindness to the community, he has passed the source code to me and given me his blessing to re-launch the tool.
As it stands today, the tool is nearly exactly as Wayne developed, with a few tweaks to have it use Guest OS 2.1 rather than 1.6. I’ve also added a contributors page to give credit to Wayne and to the organizations that are allowing me to maintain and keep the site online.
In the future, I hope to make the source code available on CodePlex as well as to add to the list of tools that live on the site. If you have any bugs with the current site or ideas for future changes, please feel free to contact me.
I’m thrilled to be speaking at the CodeMash Precompiler next week. I’m going to be joined by Mike Wood and helped by Brian Prince and Michael Collier. Together, we’ll have nearly 8 hours of instruction and hands on labs covering both the Amazon and Microsoft cloud computing platforms. Below I’ve listed the abstracts for each of the sessions as well as the prerequisites for those planning on joining us. If you are going to be in Sandusky next Wednesday, be sure to drop by.
AWS has been in the cloud computing space longer than most anyone, and they are the de facto standard when it comes to Infrastructure as a Service. While most developers are comfortable with the notion of virtual machines, reviewing the AWS offering can sometimes look like alphabet soup (EC2, S3, SNS, SDB, SQS). Join us to learn the power behind these acronyms and the tools that they can provide your next project. We’ll discuss the major components, some of the trade-offs between different implementation choices (i.e. boot from S3/boot from EBS, etc.) and provide you with the opportunity to work through some labs, deploy some code, and begin to experience the Amazon cloud for yourself.
Examples are in .NET, but fundamental concepts apply to all platforms.
Steve Ballmer has made it very clear that Microsoft is "all in" when it comes to the cloud and by now most have heard about Microsoft’s Windows Azure platform… but what does that mean for you? Whether you are an experienced .NET developer who is wondering what all this cloud stuff means for how you write code, or maybe you are a traditional *nix developer looking to understand how to integrate your existing code with the Microsoft version of the cloud, join us for an in-depth discussion on what Platform as a Service is, how Microsoft has implemented it, what scenarios it best addresses, and a collection of hands-on-labs to get you started.
Examples are in .NET, but fundamental concepts apply to all platforms.
The sessions will be part presentation, part hands on labs. While you aren’t required to bring a laptop, you’ll get much more out of the sessions if you have one available to work through the labs with (but, there might be some people willing to pair as well!). Please make sure to bring your power cord!
Here are the prerequisites to have loaded:
An Introduction to Windows Azure
· Operating Systems Supported: Windows 7 (Ultimate, Professional, and Enterprise Editions); Windows Server 2008; Windows Server 2008 R2; Windows Vista (Ultimate, Business, and Enterprise Editions) with either Service Pack 1 or Service Pack 2
· Microsoft Visual Studio 2010 (full version or the free trial).
· SQL Server 2005 Express Edition (or above) (this is usually installed with Visual Studio)
· Install the Windows Azure Tools for Microsoft Visual Studio (and some hotfixes)
· Install the Windows Azure Platform Training Kit
An Introduction to Amazon Web Services
· Requires Microsoft .NET Framework 2.0 or later.
· Use the AWS SDK for .NET with any of the following Visual Studio editions:
o Microsoft Visual Studio 2008 Professional Edition or later
o Microsoft Visual C# 2008 Express Edition (free!)
o Microsoft Visual Web Developer 2008 Express Edition (free!)
You might be thinking, "Hey, What a second! This is CodeMash, you just listed all Microsoft tools there!". Just like CodeMash, both Windows Azure and Amazon AWS are happy to mix in multiple development stacks. Our labs and demos will be shown using Visual Studio, but don’t let that stop you from following along or trying out the cloud platforms from your Mac, or using Java, PHP and Ruby on Windows. Below are links to other SDKs for each cloud platform. Please, feel free to explore your options and load these SDKs or libraries up if you prefer them.
For Windows Azure
o AppFabric: http://www.jdotnetservices.com/
o AppFabric: http://dotnetservicesphp.codeplex.com/
o and tools http://azurephptools.codeplex.com/
o and Companion http://www.interoperabilitybridges.com/projects/windows-azure-companion
o Oh, and some love for Eclipse via a plug in: http://www.windowsazure4e.org/
· Windows Azure AppFabric SDK For Ruby
For Amazon AWS
[NOTE: Updated 9/23/2010. See bottom of this post for an explanation of the changes]
This is the second in a series of posts I’m writing while working on a writing a paper dealing with the issue of maximizing data throughput when interacting with the Windows Azure compute cloud. You can read the first part here. I’m still running some different test scenarios so I expect there to be another post or two in the series.
Summary: Our tests confirmed that while within the context of the Azure datacenter (intra-datacenter transfers), sub-file parallelization for downloads (Azure blob storage to Azure Compute) is not recommended (overhead is too high), whole-file-level parallelization (parallelizing the transfer of multiple complete files) does provide a significant increase in overall throughput when compared to transferring the same number of files sequentially. Also, consistent with our prior tests, the size of the VM has a direct correlation to the realized throughput.
Detail: During the testing described in Part 1, we saw that the attempts to parallelize at the sub-file level for downloads within the Azure datacenter was significantly more expensive (on average 76.8% lower throughput) as compared to direct transfers. As such sub-file parallelization is not recommended for downloads within the Azure datacenter.
As I considered these results and thought through where the bottleneck might be, I went back through and re-instrumented the test so I could get a time snap in midst of the parallel download routine at the spot after the file blocks have been downloaded and prior to reassembling the file. What I found was that roughly 50% of the time of an individual operation was consumed in network transfer while the other half was spent assembling the individual blocks into a single file. While this gave me some ideas for further optimization, the 50% time for transfer was still significantly longer than the entire non-blocked operation (by almost 50%). As such, it seemed beneficial to take a completely different approach to improving the transfer speed.
What we came up with was to parallelize at the whole file level rather than at the sub-file level. This effectively eliminated half of the prior parallelization effort cost (no reassembly) and wouldn’t involve the overhead of querying the storage platform for the size, and then issuing a collection of range-gets.
As you can see from the chart above, even in the worse case, there is a significant improvement in the overall throughput when files are transferred simultaneously rather than sequentially. While the individual-file transfer rate dropped (average 40.1% worse), the overall transfer rate averaged 86.21% better.
Consistent with our prior results, instance size plays a role in the bandwidth. Our tests showed an average improvement in realized throughput per step increase in instance size of 14.46% (please see following note)
Note: A review of the chart hints that the small instance size takes a significant hit in the area of total network throughput and, while this accurately reflects the data collected, the third run took abnormally longer than the first two and pulled down the total results. This can be explained by a number of different factors (e.g. heavy contention on the host for network resources). I ran the test for that scenario a 4th time to satisfy my curiosity as to whether or not the third run was reflective of a larger trend and the results of the fourth run were much closer to that of the first two. So much so that if I were to substitute the fourth run results for the third run results, the overall improvement due to parallelism raises to 89.46% and the average improvement in throughput by step increase in node size goes to 10.75%. It is my belief that if I were to have run these tests/scenarios more, the outliers would have reduced and the results would be closer to those ignoring the 3rd run rather than including it.
Approach: Rather than doing a parameter sweep on a number of file sizes, I selected a specific file size (500 MB) of randomly generated data and executed my tests with that. For each parameter set, I ran executed 3 runs of 50 transfers each (150 total per parameter set). While the transfer time of each file was tracked, the total time transfer time (for all 50 files in the run) was the primary value being collected and represented in the charts above. It should be noted that this total time includes a little bit of time per file for tracing data so, in a scenario wherein that tracing activity was not present, the numbers above might be slightly better. I also tore down and re-published my platform between each run to increase my chances of being provisioned to different hardware nodes within the Azure datacenter and – theoretically, a different contention ratio with other instances on the same physical host. Also, I performed a run for all parameter sets before starting subsequent runs to decrease the likelihood that one parameter set would be inappropriately benefited (or harmed) by the time of day in which it was executed. In each test, a single worker role instance was run targeting a single storage account. There were no other applications or activities targeting that storage account during the tests runs. All of these tests were performed in the Windows Azure US North Central region between the dates of August 27, 2010 and September 2, 2010
Related Resources
NOTE: This post was updated on 9/23/2010. The changes are both substantial and not at the same time. While working on the other posts in the series, I became concerned that there were too many calculations being performed ad-hoc in Excel to get from the raw data to the charts and conclusions described here. A key goal of mine is for someone who questions my results to be able to re-run them and analyze my analysis of the data. Therefore I stepped back and generated the charts using code that shows each calculation and query. The links to the code are posted above as are links to the raw data. The charts are identical to what were here originally with the exception of some formatting changes due to the differences in generation engines. The charts are also higher-resolution and clicking on them will open the full-size version of the chart.
[NOTE: Updated 9/23/2010. See the bottom of this post for an explanation of the changes]
I’m working on a writing a paper dealing with the issue of maximizing data throughput when interacting with the Windows Azure compute cloud and am drafting some of that work as a couple of blog posts to help me work through my thoughts. I’m still working through some test scenarios and will have more to post later, but I wanted to get this out while it was still fresh.
I’ve posted before, that utilizing parallelized file file transfers is a great way to increase your overall throughput when externally interacting with Windows Azure, and the unsaid but possibly inferred thought was that it worked well for internal-to-Azure data movements as well. At the time I wrote the initial post I had done some testing of this scenario and had mixed results. A couple of recent papers I’ve read got me thinking about the topic again and so I started testing further with a slightly different approach and a different take on the variables.
Summary: Within the context of the Azure datacenter (intra-datacenter transfers), sub-file parallelization is not always as beneficial as it is outside the datacenter (local to azure or azure to local). Further, the size of the VM host has a significant impact on the realized throughput.
Detail: The key point I pulled from a paper I was reading (I’m sorry, I don’t have the reference at this time) was that another researcher had been doing tests in the Amazon cloud and indicated they were seeing significant deltas in throughput based on the Instance size/type they selected. Neither Microsoft nor Amazon list bandwidth as a variable associated with instance types (with the possible exception of the Amazon Cluster Compute Instance which boasts a 10Gbps network) but it stands to reason that given a physical host of a fixed size, an increase in the number of virtual hosts on that box (smaller instances) will result in a decrease in available throughput per virtual host. The inverse (scenarios with larger instances)also follows. This got me to thinking about Azure and whether or not the same would hold true, and, if so, how that would impact our recommended approach of splitting your files, transferring them in parallel, and then reassembling them on the other side.
Approach: Rather than doing a parameter sweep on a number of file sizes, I selected a specific file size (500 MB) of randomly generated data and executed my tests with that. For each parameter set, I ran executed 3 runs of 50 transfers each (150 total per parameter set). I also tore down and re-published my platform between each run to increase my chances of being provisioned to different hardware nodes within the Azure datacenter and – theoretically, a different contention ratio with other instances on the same physical host. Also, I performed a run for all parameter sets before starting subsequent runs to decrease the likelihood that one parameter set would be inappropriately benefited (or harmed) by the time of day in which it was executed. In each test, a single worker role instance was run targeting a single storage account. There were no other applications or activities targeting that storage account during the tests runs. All of these tests were performed in the Windows Azure US North Central region between the dates of August 27, 2010 and September 2, 2010
Results: The first sweep was aimed at identifying the impact of VM size on transfer rate using the standard MS-provided storage client library (no modifications). What we found, was that, for the most part, there was a clear relationship between the VM size and the realized throughput.
The second sweep had a similar objective as the first, with the only change being that rather than using the standard/single-threaded API calls, we used the parallelized version that we developed for our external-to-Azure tests. The results were similar to the above in that the node size showed (mostly) a consistent impact on the realized throughput (keep reading past the charts if you review the following and think I’m out of my mind).
If you are still with me, you are probably wondering why the numbers for the Parallel Upload by Node Size chart look so off from the assumed behavior… The fact of the matter is that similarly to the small node standard download tests, the third run for the small node parallel upload tests experienced a radically different performance (>75% better) than the prior two runs. This was so jolting to the numbers that I actually prepared another chart showing only the first two runs of this test to illustrate the difference that the last run made in the average results:
![]()
As you can tell from the above, these results are much closer to what you might expect (based on the values from the other tests above). The key take-away at this point, and the reason I am belaboring this aberration, in an environment where you are not in complete control, the performance you obtain from shared services (networks, storage clusters, etc) may vary widely in actual use.
The real question of interest, was to compare the two approaches (standard library vs. parallelized) so one could select the best one for a given scenario. The first chart showed exactly what I expected – the parallelized version was significantly better than the standard approach for all node sizes although the benefit waned as the node size increased.
The second chart initially caught me off guard as it illustrated that the work being done to block/download/reassemble in parallel was far less efficient than simply downloading the data.
My initial thoughts were that I was simply using an inefficient mechanism for reassembling the file but that the parallelized transfer was still likely faster than the stock approach but some additional instrumentation invalidated that thought. For the parallelized version, roughly 50% of the total time per file was spent in reassembling it, however even considering just the 50% spent in network transfer, it was roughly 50% longer than the stock approach (I’ll dig into that a bit more in later posts).
Therefore, from the data and tests we’ve run so far, using a blocked or chunked approach and parallelized transfers works well for external-to-Azure uploads and downloads as well as uploads (compute to blob storage) for internal-to-Azure movements. Internal-to-Azure downloads (blob storage to compute targets) should be performed using the standard/non-parallelized approach.
This last chart is designed to give an idea of the realized throughput by node for both upload and downloads using the “optimal” approach as determined via the tests detailed above.
As you can imagine, the results listed here triggered a number of other questions and tests. Some of these will be addressed in the next post on this topic which should be available soon.
Related Resources
NOTE: This post was updated on 9/23/2010. The changes are both substantial and not at the same time. While working on the other posts in the series, I became concerned that there were too many calculations being performed ad-hoc in Excel to get from the raw data to the charts and conclusions described here. A key goal of mine is for someone who questions my results to be able to re-run them and analyze my analysis of the data. Therefore I stepped back and generated the charts using code that shows each calculation and query. The links to the code are posted above as are links to the raw data. The charts are identical to what were here originally with the exception of some formatting changes due to the differences in generation engines. The charts are also higher-resolution and clicking on them will open the full-size version of the chart.
As part of our cloud computing initiative we have been investigating the use of containerized computing and exploring if and where it might play a role in the computational environment where I work. In the context of this effort I have had the privilege of visiting a few different locations and seeing the containers first hand – an experience which has both answered and generated a number of questions for me.
We have a unique opportunity in that the SGI ICE Cube demo truck is going to be on-site here Thursday and Friday September 9th and 10th. During that time the container will be available both for walk-in traffic as well as pre-scheduled, in-depth presentations. While there are a handful of different container vendors and approaches, seeing one in person will give you a baseline and framework by which to analyze others.
For those of you not familiar with containerized computing, it (as an approach/concept, not necessarily this particular design) is being used in some of the largest datacenters being built and is a key component in Microsoft’s 3rd and 4th generation datacenter designs.
Some interesting characteristics of the SGI design (other vendors have other distinguishing features although there are some common threads such as high density, energy efficiency, etc):
SGI has recently added to their suite of designs a totally air cooled unit that simply requires a garden hose for intake water (read: “massive energy savings”).
More information on the SGI container can be found here: http://www.sgi.com/products/data_center/ice_cube/ and a PDF datasheet is here: Datasheet PDF
If any of you live near where I work, and are interested in seeing this in person, contact me and I’ll see what I can work out (note: you must be a US citizen).
I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting.
Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards).
Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately).
We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here.
We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here.
The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results. We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post.
For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results.
Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() to assemble/commit the file in Azure storage. For each block transferred, the MD5 was calculated and sent ensuring that the bits that arrived matched was was intended. The timer for the blocked/parallelized transfer method wraps the entire process (source file splitting, block transfer, MD5 validation, file committal). A diagram of the process is as follows:
We then tested the affects of blocking & parallelizing the transfers by running the updated application against the same source set and did a parameter sweep on the block size including 256KB, 512KB, 1MB, 2MB, and 4MB (our assumption was that anything lower than 256KB wasn’t worth the trouble and 4MB is the maximum size of a block supported by Azure). The raw data for the parallel tests is available via the links in the Related Resources section at the bottom of this post.
This data was processed and then compared against the single-threaded / non-optimized transfer numbers and the results were encouraging. The Excel version of the results is available here.
Two semi-obvious points need to be made prior to reviewing the data. The first is that if the block size is larger than the source file size you will end up with a “negative optimization” due to the overhead of attempting to block and parallelize. The second is that as the files get smaller, the clock-time cost of blocking and parallelizing (overhead) is more apparent and can tend towards negative optimizations. For this reason (and is supported in the raw data provided in the linked worksheet) the charts and dialog below ignore source file sizes less than 1MB.
(click chart for full size image)
The chart above illustrates some interesting points about the results:
(click chart for full size image)
The above is another view of the same data as the prior chart just with the axis changed (x-axis represents file size and plotted data shows improvement by block size). It again highlights the fact that the 1MB block size is probably the best overall size but highlights the benefits of some of the other block sizes at different source file sizes.
This last chart shows the change in total duration of the file uploads based on different block sizes for the source file sizes. Nothing really new here other than this view of the data highlights the negative affects of poorly choosing a block size for smaller files.
Summary
What we have found so far is that blocking your file uploads and uploading them in parallel results in significant performance improvements. Further, utilizing extension methods and the Task Parallel Library (.NET 4.0) make short work of altering the shipping client library to provide this functionality while minimizing the amount of change to existing applications that might be using the client library for other interactions.
Related Resources