msgbartop
random musings and walks through code
msgbarbottom

05 Sep 11 Slides from DevLink 2011

I had the privilege of speaking at DevLink 2011 a few weeks ago in Downtown Chattanooga, TN. I have been a bit OBE (overcome by events) since I left the conference and have been unable to post my slides until now. I hope to get the videos and other materials up in the coming week or so. If you came to one of these sessions – thanks – the attendance at both was great and I appreciated the questions from the audience.

 

Source code for GPGPU Talk

 

Source code for AWS Guest Book Demo

Source code for Azure Guest Book Demo

09 Jun 11 Using Nevron Controls in Azure

I’ve been playing around with the Nevron Controls for an Azure application I’m building (hopefully more on that soon) and I’ve been fighting with a simple problem that I’m posting here for my own remembrance and hopefully to help a few others.

The problem has been, that the Nevron controls worked fine when I was testing the web app directly, but would cause the dev fabric to blow up if I tried to run it there. I even tried to simply deploy it to Azure assuming that possibly it was a “feature” of the dev fabric – no dice.

Well, today I had some time to dig to the bottom of it and found that it was a simple problem with the way the http handlers were registered. By default, I had the handlers registered like this:

<system.web>
    <httpHandlers>
        <add verb="*" path="NevronDiagram.axd" type="Nevron.Diagram.WebForm.NDiagramImageResourceHandler" validate="false"/>
        <add verb="GET,HEAD" path="NevronScriptManager.axd" type="Nevron.UI.WebForm.Controls.NevronScriptManager" validate="false"/>
    </httpHandlers>
</system.web>

However, as Shan points out in this post: http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/0103ca2d-e952-4c28-8733-47630535c05c, you need to use the newer IIS 7 integrated pipeline. A closer look at the official Nevron samples, shows that they accounted for this and I simply missed it. The setup should be something like this:

<system.webServer>
    <validation validateIntegratedModeConfiguration="false"/>
    <handlers>
        <add name="NevronDiagram" preCondition="integratedMode" verb="*" path="NevronDiagram.axd" type="Nevron.Diagram.WebForm.NDiagramImageResourceHandler"/>
        <add name="NevronScriptManager" preCondition="integratedMode" verb="*" path="NevronScriptManager.axd" type="Nevron.UI.WebForm.Controls.NevronScriptManager"/>
    </handlers>
</system.webServer>

Notice in particular that not only is the structure a little different, the declarations are under the system.webServer node rather than the system.web node.

30 Aug 10 Raw HTTP Interactions from C#

I had a conversation with a friend last week who was messing around with a non-SOAP based HTTP service and was fighting with the C# necessary for rudimentary interactions. The problem was compounded by the fact that he needed to associate a certificate with the request to authenticate properly to the server.

I had recently been doing this exact thing based on some work with the Azure management API so I promised him some samples. As I was assembling them this morning, I decided to drop them here in case they could be beneficial to others.

The first sample shows a simple call wherein I build some XML and send it along with the request. In this case we are creating a deployment in Windows Azure. We then grab a value from the response header collection for use in the second call that has a simple request format but returns an XML blob in the body that I then parse to get the results I need.

Both requests are signed with an X509 client certificate – you’ll notice it referred to as “managementCertificate” – this is a variable passed in that was generated using the following code:

var managementCertificate = new X509Certificate2(manifest.CertificateFile);

Where manifest.CertificateFile is the path to the pem file on my local machine.

In the sample below, you’ll see the target URL built, some base64 encoding of some of the parameters (included just for completeness but just a requirement of the service I was calling). I then use a StringBuilder to build up an XML block and then setup the request with the certificate, xml blob, and other properties set. Finally, you’ll see the submission and then pulling a value from the headers collection to be sent back to the caller.

// Build uri string
// format:https://management.core.windows.net/<subscription-id>/services/
//                hostedservices/<service-name>/deploymentslots/
//                <deployment-slot-name>
var url = string.Format(
    "{0}{1}/services/hostedservices/{2}/deploymentslots/{3}",
    Constants.AzureManagementUrlBase,
    subscriptionId,
    serviceName,
    deploymentSlot);

// Base64 encode configuration label and file
var base64label = EncodeAsciiStringTo64(configurationLabel);
var base64config = GetSettings(
    instanceCount,
    accountName,
    accountKey,
    queueSleepTime,
    maxJobLength,
    container,
    queueName);

// build request body
StringBuilder blob = new StringBuilder();
blob.Append("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n");
blob.Append("<CreateDeployment " +
    "xmlns=\"http://schemas.microsoft.com/windowsazure\">\n");
blob.AppendFormat("\t<Name>{0}</Name>\n", deploymentName);
blob.AppendFormat("\t<PackageUrl>{0}</PackageUrl>\n", packageUrl);
blob.AppendFormat("\t<Label>{0}</Label>\n", base64label);
blob.AppendFormat("\t<Configuration>{0}</Configuration>\n", base64config);
blob.Append("</CreateDeployment>\n");

// encode request body then put it in a byte array
byte[] byteArray = Encoding.UTF8.GetBytes(blob.ToString());

// make request
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);

// header info
request.Method = "POST";
request.ClientCertificates.Add(managementCertificate);
request.Headers.Add(Constants.VersionHeader, Constants.VersionTarget);
request.ContentType = Constants.ContentTypeXml;
request.ContentLength = byteArray.Length;

Stream dataStream = request.GetRequestStream();

// write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);

// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// Get the x-ms-requestID
string requestID = response.GetResponseHeader(Constants.RequestIdHeader);

// Clean up the streams
dataStream.Close();
response.Close();

return requestID;

In this next sample, we make a rather simple request but do more with the result in parsing the returned XML blob which is fairly trivial although it does have custom namespaces which have to be accounted for when you crawl the XML tree.

// Build uri string
// format:https://management.core.windows.net/<subscription-id>/services
//          /hostedservices/<service-name>/deploymentslots
//          /<deployment-name/
var url = string.Format(
    "{0}{1}/services/hostedservices/{2}/deploymentslots/{3}",
    Constants.AzureManagementUrlBase,
    subscriptionId,
    serviceName,
    deploymentSlot.ToString());

// make uri request using created uri string
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

// make header, method, and certificated requests
request.Method = "GET";
request.ClientCertificates.Add(managementCertificate);
request.Headers.Add(Constants.VersionHeader, Constants.VersionTarget);
request.ContentType = Constants.ContentTypeXml;

// Get the response
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// put response into string text
StreamReader dataStream = new StreamReader(response.GetResponseStream());
string text = dataStream.ReadToEnd();

// create an xml document
XmlDocument xml = new XmlDocument();

// load up the response text as xml
xml.LoadXml(text);

// get the NS manager
XmlNamespaceManager ns = new XmlNamespaceManager(xml.NameTable);
ns.AddNamespace("az", Constants.AzureXmlNamespace);

// return the status
DeploymentStatus currentStatus;
var statusText = xml.SelectSingleNode("//az:Status", ns).InnerText;

if (Enum.TryParse<DeploymentStatus>(statusText, true, out currentStatus))
{
    FullDeploymentStatus fullStatus = new FullDeploymentStatus()
        { MainStatus = currentStatus };

    // now try to get the status values for each instance
    XmlNodeList instances = xml.SelectNodes("//az:RoleInstance", ns);

    foreach (XmlNode instance in instances)
    {
        var instanceStatus = new InstanceDetails()
        {
            RoleName =
                instance.SelectSingleNode("az:RoleName", ns).InnerText,
            InstanceName =
                instance.SelectSingleNode("az:InstanceName", ns).InnerText,
            Status = (InstanceStatus)Enum.Parse(typeof(InstanceStatus),
                instance.SelectSingleNode("az:InstanceStatus", ns).InnerText)
        };

        fullStatus.Instances.Add(instanceStatus);
    }

    return fullStatus;
}
else
{
    throw new ArgumentOutOfRangeException("Status",
        "The status returned for the deployment is outside the range of " +
        "acceptable values");
}

That is about it. Hopefully this is helpful and give you more comfort interacting with HTTP-based services that aren’t simply a matter of pointing at a WSDL and having magic happen.

26 Apr 10 External File Upload Optimizations for Windows Azure

I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting.

Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards).

Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately).

We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here.

We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here.

The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results.  We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post.

For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results.

Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() to assemble/commit the file in Azure storage. For each block transferred, the MD5 was calculated and sent ensuring that the bits that arrived matched was was intended. The timer for the blocked/parallelized transfer method wraps the entire process (source file splitting, block transfer, MD5 validation, file committal). A diagram of the process is as follows:

ParallelAzureUploadDirect

We then tested the affects of blocking & parallelizing the transfers by running the updated application against the same source set and did a parameter sweep on the block size including 256KB, 512KB, 1MB, 2MB, and 4MB (our assumption was that anything lower than 256KB wasn’t worth the trouble and 4MB is the maximum size of a block supported by Azure). The raw data for the parallel tests is available via the links in the Related Resources section at the bottom of this post.

This data was processed and then compared against the single-threaded / non-optimized transfer numbers and the results were encouraging. The Excel version of the results is available here.

Two semi-obvious points need to be made prior to reviewing the data. The first is that if the block size is larger than the source file size you will end up with a “negative optimization” due to the overhead of attempting to block and parallelize. The second is that as the files get smaller, the clock-time cost of blocking and parallelizing (overhead) is more apparent and can tend towards negative optimizations. For this reason (and is supported in the raw data provided in the linked worksheet) the charts and dialog below ignore source file sizes less than 1MB.

RateImprovement

(click chart for full size image)

The chart above illustrates some interesting points about the results:

  • When the block size is smaller than the source file, performance increases but as the block size approaches and then passes the source file size, you see decreasing benefit to the point of negative gains (see the values for the 1MB file size)
  • For some of the moderately-sized source files, small blocks (256KB) are best
  • As the size of the source file gets larger (see values for 50MB and up), the smallest block size is not the most efficient (presumably due, at least in part, to the increased number of blocks, increased number of individual transfer requests, and reassembly/committal costs).
  • Once you pass the 250MB source file size, the difference in rate for 1MB to 4MB blocks is more-or-less constant
  • The 1MB block size gives the best average improvement (~16x) but the optimal approach would be to vary the block size based on the size of the source file.

 

RateImprovement2 
(click chart for full size image)

The above is another view of the same data as the prior chart just with the axis changed (x-axis represents file size and plotted data shows improvement by block size). It again highlights the fact that the 1MB block size is probably the best overall size but highlights the benefits of some of the other block sizes at different source file sizes.

DurationReduction

This last chart shows the change in total duration of the file uploads based on different block sizes for the source file sizes. Nothing really new here other than this view of the data highlights the negative affects of poorly choosing a block size for smaller files.

 

Summary

What we have found so far is that blocking your file uploads and uploading them in parallel results in significant performance improvements. Further, utilizing extension methods and the Task Parallel Library (.NET 4.0) make short work of altering the shipping client library to provide this functionality while minimizing the amount of change to existing applications that might be using the client library for other interactions.

 

Related Resources

30 Jul 09 Azure Blob Storage Blob IDs and “+”

I’ve been kicking the tires on Azure’s blob storage and am working on uploading a 1.2GB+ NetCDF file. I stumbled across a couple of samples online that were very helpful in avoiding the de facto client library that ships with the SDK however I found myself bit by something (likely due to my error somehow) that I thought I’d pass along.

When processing a larger file, my upload process would always fail at block #248. At first, I assumed it was a network transience issue and simply re-ran the upload, however, after having it fail on the exact same block 3 times, I decided that it wasn’t the network. In digging a bit into things, I found that the problem had to do with the encoding of the block IDs. The offending piece of code is here:

code04

 

where i is an integer representing the index of the current block within the file and blockIds is an array of IDs used to build the block ID list as part of a putBlockList operation.

The Azure SDK would indicate that this code snippet is perfectly valid… block IDs need to be a base 64-encoded string uniquely identifying the block within the blob. Further, each ID (within a blob) must be of the same length prior to encoding (same number of bytes). In this scenario, BitConverter.GetBytes returns a 4-byte array of values for all numbers within the range (in my case, 0 – 314). The following is an example of the resulting string for some numbers:

  • 246: 9gAAAA==
  • 247: 9wAAAA==
  • 248: +AAAAA==

There continues about 4 that begin with a ‘+’ sign, and a similar number that begin with ‘\’. Every other index in my collection began with a normal alpha character. After doing some poking around I found some indications that others were having similar problems and went down the path of encoding the line differently (i.e. HttpServerUtility.UrlTokenEncode, etc) to no avail. What I ended up with is simply prefixing my values with a standard “safe” string (“BlockId”)

code05

This yielded a blockId that was unique, consistent length (notice the formatting of the indexer in the ToString() method), and “safe” in that it always began with a URL-safe character.

I’m certain that there is likely a better way to solve this problem, but this did the trick for me and maybe it will be helpful to someone else.

24 Mar 09 2009 BJU Programming Contest

bju_logo

I had the privilege of being one of the alumni-judges at the annual Bob Jones University Computer Science departments programming contest. This was the first time I’ve participated in this type of contest and I found it very interesting. The CS department had a fairly slick harness for executing the contest and supporting the judging in multiple languages and multiple platforms. As with anything of this nature, there were a few bumps in the road, but nothing of any consequence.

The contest turned out wonderfully… we  had around 35 contestants (I lost count because we overflowed the one room and had to use a different room). There were 10 problems of various difficulties to be solved in a 3-hour time window. The contestants could solve the problems in any order, and could choose both their platform (Windows/Linux) and their language (C#, Java, Python, C++, Ruby). To my surprise, many of the contestants switched between languages rather than using just one as I would have expected. Every contestant solved at least one problem properly and all of the problems were solved by at least one person. The distribution of problems solved was pretty balanced as well.

As a judge, my job was to monitor the queue for submitted answers, run the submissions through the test harness and reply on the results back to the contestant. I was a bit amazed (though I shouldn’t have been) at the wide variety of coding styles and levels of verbosity to solve the same problems. The contestants could also submit questions to the judges, and the favorite for the day was “can I leave and not come back?”.

I’d like to congratulate the winners and all of the contestants for a fine job and look forward to participating in next year’s event.

26 Nov 08 .NET is a Smorgasbord?

Like many other .NET devs I often find myself expecting to be current in all of the existing and up-coming tools/technologies in the Microsoft/.NET platform. Frankly, I don’t know how that is possible, especially with the pace at which MSFT (not to mention the surrounding ecosystem) is releasing tools and platforms. Over the past few years, my approach has been to know "enough" about the various tools/technologies so that I can be conversant, and also know when a particular toolset applies to my current project, thereby warranting a "deeper" dive into that area. Such has been the case for me with WPF and WCF (much of my work over the past while has been in the SharePoint/web space meaning WPF – until Silverlight – didn’t have much of a play and we hadn’t yet seen a need to switch from standard ASMX for our services). They fell into the bucket of tools I had seen while walking along the smorgasbord, but I simply hadn’t decided I needed to consume them yet.

Scott Hansleman describes the .NET Framework and the MSFT tool suite as making it easy to "fall into success" (I’m sure I’m not quoting him correctly, but the idea is the same). Essentially, the tool set, while robust and quite capable, is approachable and relatively easy to simply build something. Especially when you compare it to other languages such as C++ — in C#/.NET it is relatively easy to build "okay" code, and not that hard to build good code and almost (yes, there are plenty of exceptions) hard to write *bad* code. It is much easier (at least in my opinion) to write bad C/C++ code and much harder to write good C/C++ .code. I agree with him 100% – once you have a core competency on the platform, picking up the basics of the "new" stuff becomes almost trivial

I was recently working on a project (someone else did most of the coding – I did some of the design and proof-of-concept work) and I was able to see this in action. We were building a security-focused app, being deployed to a mixed environment of XP and Vista machines, and we had a 6-7 week window to build it, test it, and have it deployed. We ended up building a Windows Service that hosted a WCF service, a desktop application using WPF, a webpart for SharePoint and an IIS-hosted WCF service. We made heavy use of the cryptography libraries which, oddly (to me) were one of the areas that the other developer had prior experience with, however neither he nor I had done any real work with WCF and WPF. The technologies offered us quite a bit as far as functionality and form, even for two guys who weren’t "experts" in them – that’s where the "magic" lies – I’m reasonably comfortable with the MSFT dev stack, and I’m handed two completely new-to-me technologies, and with a relatively small amount of effort, I’m able to use them in my application and reap the benefits they bring. Now, certainly there’s quite a bit more functionality that WPF/WCF bring to the table than what we used or "grok’d" during this project, but it did what we needed to and quickly – making me want to dig further into those technologies and to use them for other projects.

26 May 08 Finally back where I want to be…

It’s frustrating to me to find myself redoing things that I’ve done before or re-solving problems. Over the years at Planet I’ve been involved with different software teams each with different levels of rigor, however most all of them have had, at minimum, an automated build process of some sort (at least for the past 4 years or so). Some of these systems were elaborate msbuild driven systems while others were a cobbling together of batch scripts or PowerShell linking msbuild, Vault, FogBugz and Community server.

The customer I’ve been working with for the past 16 months has "bitten off" the entire TFS tree and I’ve been the prime developer responsible for implementing it and getting it going… all, of course, while doing "real work". Further, (nearly) all of the work we’ve been doing has been SharePoint focused (custom list event handlers, web parts, site definitions, etc) which means any build must generate properly formed SharePoint Solution (*.wsp) files and the approaches to doing this and handling the installation/upgrade of such are pretty varied.

This weekend I finally completed a build on a project that meets my "minimum requirements" for being a properly formed build. I’m pleased that I was able to, in relatively short order, apply it to another project verifying repeatability. Here’s what we’re doing:

  1. All build scripts are handled by TFS 2008 (using OOTB functionality)
  2. Solution manifest files and DDF files are maintained both in dev and production build by a customized version of stsdev v1.3 (http://codeplex.com/stsdev)
  3. An "installer" is provided as part of the build (<buildRoot>/Install) that allows the back office team to simply double-click and go. We use the SharePoint Installer (http://codeplex.com/sharepointinstaller) tool/framework to provide this function
  4. All required web.config settings are handled via the feature receiver allowing them to be properly installed/removed on activation/deactivation
  5. Developer-level documentation is provided for the build based on the /// comments in the code. We use Sand Castle (http://codeplex.com/sandcastle) to do the core generation and Sand Castle Help File Builder (http://codeplex.com/shfb) to assist with the build script integration (I tried DocProject – http://codeplex.com/docproject – but it pooched vs 2008 and never worked as described – hopefully it will be more stable when it exists beta).
  6. Passed Style Cop (MS Source Analysis) rules
  7. Passed Code Analysis rules

I still have a ways to go prior to reaching my "nirvana"…

  1. Build should automatically run code analysis (this is certainly possible, I’ve simply not gotten it implemented yet)
  2. Build should automatically run source analysis (this is possible, I’ve simply not gotten it implemented yet)
  3. Full testing (unit and system) on build completion – Ideally it would spin up a VM, deploy the appropriate code, execute the test battery, clean up and report on the results.

Even so, it felt good to get a respectable build out the door and to know that it was process driven and repeatable.