The Danger with using a framework…

26. February 2010

The danger with using a framework is that sometimes it does things that you aren’t aware of that can send you in circles for  quite some time before you figure them out.

I’ve been working on some tests of some “creative” ways to get data in and out of cloud platforms at rates above the norm and I’ve written a test harness that I’ve been using that will grab a bunch of files, one at a time, and record the file size, duration, etc. for the transfer. I’ve been doing this in a single-threaded fashion for quite some time with reasonable success. The problem began when I attempted to use to run a test that did multi-threaded downloads (multiple threads each grabbing a portion of the file).

NOTE/Crazy Quirk: I don’t yet know why this is the case, but the problem I’m preparing to explain did *not* appear while I had Fiddler running… only when it was *not* running. I’m guessing that this is due to some “magic” that Fiddler does to the HTTP/networking stack..

The behavior I was seeing, was that after two threads would execute, all subsequent threads would fail or timeout. Obviously, when one is doing a significant amount of data movement, this is sub-ideal. The culprit turned out to be the ServicePointManager’s DefaultConnectionLimit. By default, this is configured to 2 which means you can, at most, have 2 open connections to the same TLD at the same time. When I was doing this in serial, there was no problem as the connections were managed/re-used on the main (only) thread.  When doing a number of operations to the same URL (TLD) from multiple threads (especially when you are setting up/tearing them down quickly), it appears that the ServicePointManager is unable to re-use them (not surprising) but neither is it able to determine that the thread is now gone as should be the connection count. (yes, I was behaving and closing my connections).

The solution I came up with was to first shorten the time to live for idle threads, next to monitor the number of threads currently “consumed” and to increase the limit based on how many I needed for the current operations, all while ensuring an upper bound and stand-off mechanism should things get too far out of bounds.

 

// ensure that we don't have lingering connections that will hamper our ability to continue...
// Start by getting the ServicePoint for our current Url 
ServicePoint servicePoint = ServicePointManager.FindServicePoint(new Uri(url));

// see how many connections currently exist...
int existingConnections = servicePoint.CurrentConnections;

// if we are above our upper bound, wait a bit to let things settle down...
while (existingConnections >= 64)
{
    Console.WriteLine("Connection count too high... sleeping for a bit...");
    Thread.Sleep(1000);
}

// ensure that we have enough room to do what we need
if ((existingConnections + options.ConcurrentThreads + 1) > servicePoint.ConnectionLimit)
{
    servicePoint.ConnectionLimit = existingConnections + options.ConcurrentThreads + 1;
}

// only give them a few seconds (5) to time out...
ServicePointManager.MaxServicePointIdleTime = 5000;

Console.WriteLine("Pre-Existing Connections: {0}", existingConnections);
Console.WriteLine("Connection Limit: {0}", servicePoint.ConnectionLimit);

Hopefully, this will be helpful for someone else hitting the same issue.

Miscellaneous , , ,

Automated Chart Generation

18. December 2009

It’s late on the Friday afternoon before Christmas week which means things are pretty quiet around the office. This quiet has the net-effect of allowing me to get quite a bit done. The last few days have been very productive with respect to our research project and Azure work (more on that coming soon) which is now in full swing. We are currently working on collecting performance data from our codes running in Azure (and soon in the Amazon cloud) and are also doing some testing of transfer speeds of data both to/from the cloud as well as between compute and storage in the cloud.

I’ve been working to automate much of this testing so we can do things in a repeatable fashion as well has have something that others could run (both other users like ourselves as well as possibly vendors should we come across something that requires a repro scenario). So far, running tests and generating data in CSV or XML format is pretty simple, but I found myself wanting to automatically generate charts/graphs of the data as part of the test process to allow a quick visualization of how the test performed. I spent a good bit of the day looking at old tools for command-line generation of charts (i.e. RDTool, etc.) and none of them were exactly what I was looking for – not to mention my proclivity to using C# and VS.NET tools and my desire to have something that looked refined/polished and not overly raw.

Thankfully, I stumbled upon something I should have remembered existed but simply hadn’t had the need to use before – the System.Windows.Forms.DataVisualization.Charting class. If you aren’t familiar with this assembly, it was released at PDC08 and has a companion Web class for performing similar operations in ASP.NET applications. In my basic testing I was able to build a console application that would ingest the CSV output from my testing harness and then generate some fairly nice looking charts based on that data. The following shows a chart (click the chart to see it full size) generated from ~1800 data points, and automatically generates a 50% band and 90% band allowing the viewer to very easily ascertain the averages and data points. This was generated using a combination of the FastPoint and BoxPlot chart types.

chartImage

Miscellaneous , ,