You Still Have to Plan and Understand Your Toolset

18. May 2010

I just finished reading an article (http://searchcloudcomputing.techtarget.com/news/article/0,289142,sid201_gci1512394,00.html) discussing some of the power issues and related outages at one of Amazon’s (http://aws.amazon.com) data centers last week. While much of the article was fine and factual, I take a bit of issue with the way the article wraps up:

Users may not like being told they should fend for themselves on disaster preparedness, but that appears to be part of the price for getting everything else AWS offers.

This highlights a sentiment that is unfortunately pervasive within the community of those evaluating or adopting cloud computing – that of believing that cloud computing is a panacea for all scale and datacenter problems.

What the users of these platforms need to understand is that they are toolkits. While the various cloud computing vendors provide important services and features, the consumer of said platforms must do their homework to understand the technical tradeoffs of various decisions so that they can appropriately reap the benefits of the selected platform. Simply uploading your code/application and expecting it to be always available is unrealistic. The consumer must understand what high availability features are offered by their particular cloud vendor and exploit those features to ensure that their app has the appropriate availability. In the case of the Amazon outage(s), if users had followed the high-availability guidelines provided by Amazon, they would not have experienced any outage at all. Cloud providers such as Microsoft, Amazon, and others provide the notion of availability zones, or regions, and – much like you would if you were hosting the app yourself – you need to distribute your application across such to ensure that a failure in one location doesn’t mean a complete outage for your application.

Rather than a magic wand that solves all scaling and availability issues, cloud computing provides a democratized toolset that informed consumers can use to develop a highly available, scalable, and fault-tolerant application. The key word here is “democratized” – meaning – these features are available to anyone, at a fraction of the cost of doing it yourself. I experience similar frustration when reading complaints from folks about the pricing of Windows Azure (i.e. “Why can’t I host my simple website there fore $10/month?”). The question illustrates that the inquirer doesn’t understand the fundamental architecture of the platform (both how it works, and what its primary use cases are). Neither Amazon’s EC2 nor Windows Azure are designed to compete with a low-cost web hoster… rather they are designed to provide the tools by which a company that needs features not available from a low-cost hoster, but doesn’t have (or wish to spend) the capital to build those features themselves.

They are great platforms that provide you the ability to build a very solid offering, but you have to understand how to properly utilize those features. Cloud computing should not be approached with ignorance or any less planning than you would if you were building out the infrastructure yourself (of course the level of detail will differ).

Cloud Computing, Theory

External File Upload Optimizations for Windows Azure

26. April 2010

I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting.

Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards).

Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately).

We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here.

We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here.

The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results.  We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post.

For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results.

Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() to assemble/commit the file in Azure storage. For each block transferred, the MD5 was calculated and sent ensuring that the bits that arrived matched was was intended. The timer for the blocked/parallelized transfer method wraps the entire process (source file splitting, block transfer, MD5 validation, file committal). A diagram of the process is as follows:

ParallelAzureUploadDirect

We then tested the affects of blocking & parallelizing the transfers by running the updated application against the same source set and did a parameter sweep on the block size including 256KB, 512KB, 1MB, 2MB, and 4MB (our assumption was that anything lower than 256KB wasn’t worth the trouble and 4MB is the maximum size of a block supported by Azure). The raw data for the parallel tests is available via the links in the Related Resources section at the bottom of this post.

This data was processed and then compared against the single-threaded / non-optimized transfer numbers and the results were encouraging. The Excel version of the results is available here.

Two semi-obvious points need to be made prior to reviewing the data. The first is that if the block size is larger than the source file size you will end up with a “negative optimization” due to the overhead of attempting to block and parallelize. The second is that as the files get smaller, the clock-time cost of blocking and parallelizing (overhead) is more apparent and can tend towards negative optimizations. For this reason (and is supported in the raw data provided in the linked worksheet) the charts and dialog below ignore source file sizes less than 1MB.

RateImprovement

(click chart for full size image)

The chart above illustrates some interesting points about the results:

  • When the block size is smaller than the source file, performance increases but as the block size approaches and then passes the source file size, you see decreasing benefit to the point of negative gains (see the values for the 1MB file size)
  • For some of the moderately-sized source files, small blocks (256KB) are best
  • As the size of the source file gets larger (see values for 50MB and up), the smallest block size is not the most efficient (presumably due, at least in part, to the increased number of blocks, increased number of individual transfer requests, and reassembly/committal costs).
  • Once you pass the 250MB source file size, the difference in rate for 1MB to 4MB blocks is more-or-less constant
  • The 1MB block size gives the best average improvement (~16x) but the optimal approach would be to vary the block size based on the size of the source file.

 

RateImprovement2 (click chart for full size image)

The above is another view of the same data as the prior chart just with the axis changed (x-axis represents file size and plotted data shows improvement by block size). It again highlights the fact that the 1MB block size is probably the best overall size but highlights the benefits of some of the other block sizes at different source file sizes.

DurationReduction

This last chart shows the change in total duration of the file uploads based on different block sizes for the source file sizes. Nothing really new here other than this view of the data highlights the negative affects of poorly choosing a block size for smaller files.

 

Summary

What we have found so far is that blocking your file uploads and uploading them in parallel results in significant performance improvements. Further, utilizing extension methods and the Task Parallel Library (.NET 4.0) make short work of altering the shipping client library to provide this functionality while minimizing the amount of change to existing applications that might be using the client library for other interactions.

 

Related Resources

Cloud Computing, Theory, General Development ,

QoS Aware Clouds

13. April 2010

I’ve been reading a paper this morning published by Microsoft Research on Quality of Service Aware Clouds. If you are engaged in the cloud computing field, I would suggest that it is worth the time to read (14 pages) if for no other reason than to get your mind rolling (as it did mine) on the topic. Further, I’d be keenly interested in follow-on conversations from the community as to the issues/remedies put forth in that paper.

I’m finding myself split on the topic… academically, there are some interesting points being made:

  • VMs from different customers running on the same physical host can negatively impact each other (think, last-level cache contention, memory bandwidth, I/O paths, etc.) (this isn’t a surprise to anyone, just the premise on which the rest of the paper is based)
  • They suggest an intelligent VM placement algorithm based on resource utilization models of the applications running in the VMs
  • They suggest reserving a certain amount of headroom on each physical host to allow for dynamic compensation and CPU throttling adjustments to maintain QoS
  • They suggest addressing the “wasted” resources represented by the headroom in the normative case by means of a bid-based “higher quality” service level available to users for additional fees
  • The key premise is that many apps hosted in the cloud are *not* CPU bound, and putting those that are along side those that are not will provide everyone with a reasonable level of service (as compared to putting all CPU-bound VMs on the same nodes resulting in resource contention issues for those nodes whereas other nodes with lighter workloads are, effectively, sleeping).

 

However, I find myself struggling with a few things:

  • The solution they propose is based on the ability to accurately model the workload of a given VM (they suggest in a non-contended staging area) and to then to the initial production VM placement based on a balancing algorithm. I struggle with this as I think the majority of cases are going to be either “quick hits” (user is just posting a handful of VMs to use for awhile after which they will be torn down) or scenarios in which the cost of eeking out 100% QoS consistency is going to be greater than simply “spinning up another VM”. I’m guessing that the later case is more likely, and that most users will simply accept the balanced performance for what it is, and adjust their total number of VMs accordingly. The exception to this rule will likely be permanent (or nearly so) installations (wherein the cost of running nodes for a long time is higher)
  • They propose that the cloud providers insert “head room” into their resource deployment strategy (as a means of compensating for contention) and then mitigate this “waste” by selling higher levels of Quality to interested customers who are willing to bid if you will for increased performance when available but content to live with a lesser service level in the normative case. I struggle with this as it, in my mind, inserts another level of complexity to the pricing model of cloud computing that simply will not survive in the market place. From the vantage point of the user, this would further introduce variance into my overall QoS as, the normative case *might* be that I run at the higher service level (normally the host is not experiencing high levels of interference) but I will be dropped to the lower level without notice. This could add significantly to my auto-scaling complexity as I now no longer need to scale up/down simply based on traffic or load, but also have to monitor the QoS state I’m currently being provided by the cloud provider.
  • They discuss using market-based “bidding” for higher QoS levels which I think is problematic due to the fact that they are, in this case, asserting that Quality of Service 0 (Q0) (the normative state) is something less than a full core (in their examples, something around 50%) and that Q1 is higher (maybe 75%). The problem here is perception vs. reality in that most users of cloud platforms would assume that when they get a single-core VM, they are getting ALL of that VM. Therefore, asserting that what you are really getting is 50% of said core, and that if you pay more, we’ll periodically give you more (except for when we don’t due to someone else on the same host over utilizing their portion) seems a difficult sell.
  • The algorithm and approach targets the “behaving” code rather than punishing the misbehaving code. Rather than determining which VM on the chip is overrunning the rest and curtailing that, they attempt to simply help those who are under performing.

 

I think, that in the end, I’m more in favor of simply having more intelligent hypervisors that provide better isolation for VMs, but I’m still thinking this all through. There are some interesting points made in this paper, and intelligent allocations could be interesting…

Cloud Computing, Theory

Cloud Futures 2010: Panel on Cloud Applications - New Experiences and Expectations

7. April 2010

I am in Redmond this week and am participating in two workshops being hosted by different groups within Microsoft Research. Along with a handful of others, I was asked to participate in a panel discussion on Friday dealing with new experiences that cloud computing would facilitate, as well as things we felt were road blocks to seeing those experiences realized. He specifically challenged us to think "outside the box" and to look beyond (the now typical) conversations surrounding raw performance and to dream a little. I wrote out the following as a means of working through my thoughts for my 5-7 minute portion of the panel discussion and, as it took me longer than 7 minutes to read, I thought I'd post it here as a expansion of the talk and possibly an anchor on which to hang subsequent conversations. Please forgive the casual nature of the talk as it is intended to be, essentially, a script read delivered to a group rather than a formal written version of the same.

---

This topic is certainly interesting to me as I am convinced that cloud computing is here to stay and also presents a platform that can be disruptive to the scientific/technical computing industry (although I would qualify this by saying “disruptive in a constructive sense” – meaning that the disruption leads to the additive good and not the removal of existing work). I have spent a considerable amount of time over the past week contemplating this question (how do we imagine cloud computing facilitating new usage scenarios), and have chosen to present my reply by means of a few examples.

The first example is that of Lego MindStorms. Are you familiar with these? They are kits that provide kids (regardless of how old they are :) ) the ability to build robots using a familiar (although slightly altered) Lego metaphor. These kits come with motors, sensors, and a "brain" that is programmable via a drag-and-drop software tool but also supports more complex tools such as Microsoft's Robotics Studio. Do you know what is so great about these (besides the obvious)? They allow common people, with no prior robotics or electronics experience, to dabble in the field. It is, a gateway, if you will, to a much broader field.

The second example is more of an experience that happened to me recently in that I had the privilege of running into my high school science teacher this past weekend - a quiet, rather unassuming fellow named Randy White. Randy's brilliance is that he has a passion for science and did (at least in my case) an excellent job of transference. If I am ever able to accomplish anything interesting in the scientific domain, a large portion of the credit will lie with him. Probably the most important thing he taught us, was how to think about, or to tackle the complex. I can't tell you how many times I heard him say, "Start with what you know". The idea being, that most often, incredibly complex problems were comprised of nothing more than a series of far simpler, and additive problems. He taught us to focus on solving what we could, rather than attempting to "swallow the entire elephant" if you'll allow me to strain a metaphor.

If you find yourself wondering what these two examples have to do with each other, or more germanely, what do they have to do with my vision for the scenarios that cloud computing will open, let me see if I can explain...

You see, much in the same manner as Lego MindStorms have introduced an otherwise unlikely audience to the world of robotics, I believe that cloud computing (based on its cost model and popular programming paradigms) is a means of introducing normal people (and by this, I mean those not formally trained in scientific or technical computing) to the notion of using computation as a tool for solving complex problems. Possibly to the dismay of some in the field, I think that this will, at least initially be done in a means void of the topics of MPI, or Fortran, much in the same way as a 15 year old "programming" his robot doesn't have to understand the inner workings of concurrency runtimes nor the physics at work when his robot "walks" for the first time. I will be the first to admit that these (MPI, Fortran, concurrency topics, race conditions) are important topics, but I would submit that they should not be gating factors to one's ability to explore the arena and determine if he/she is interested in further study in that field. I think we will see paradigms that are far simpler to adopt, such as master-worker, map/reduce, etc. (or even cloud-backed applications that are hidden behind more accessible tools such as Excel, or MatLab) take hold in significant ways and that we will see the development of novel approaches to solving problems using this new platform. The tired-and-true tools will remain, and will be used when necessary and appropriate, but I think if we force them down the throats of the next generation of researchers as "the only way to accomplish science", we are doing them a great
disservice.

As to where Mr. White and high-school science comes into play - well, this can best be summarized by a comment made by a friend of mine, Wally McClure when he, almost flippantly, referred to Windows Azure as a "poor man's supercomputer". Being one that had been working with Azure for quite a bit at the time, I took a little offense at the accolade due to its semi-pejorative nature, and prefer the "common man", but the point is the same regardless: Cloud Computing (at least as currently manifested in both Windows Azure and Amazon's AWS platform), has a great potential to democratize high-performance computing. You see, the high-school I grew up in was small... we had 23 in my graduating class. While Randy has moved on, he still teaches in a comparatively small school that certainly has no funds for a cluster on which to run experiments. However, with the advances in cloud computing, Randy could devise a collection of simple experiments and actually execute them as part of a class project. He could have a significant computational cluster for the equivalent of a few dollars. He can present "Scientific" computing as something obtainable to his students, and hopefully foster an interest that will develop into the next generation of computational thinkers - solving one problem at a time, incrementally, on the way to solving massive problems that we have trouble even describing today.

It is, in my opinion, incumbent upon us - the current generation of computational researchers and domain-specific scientists - to look at cloud computing not as a threat to the establishment, but as facilitating a new means of scientific discovery. We should consider ways to make large-scale computation more accessible to "normal" people. We should be opening up the community, sharing wherever possible, reducing the barriers to entry. Challenge yourselves and your students to push boundaries, to consider non-traditional approaches, and to enjoy "playing" with computational resources.

Conferences, Theory, Cloud Computing

Data and Published Results in Scientific Research

12. February 2010

I've been working on data-intensive projects recently and I'm sure that there is a point in every computational researcher's life when he begins to think about the data that they are generating – how are they going to store it, how is it going to be tracked, what code/circumstances were used to create or collect it, how is it going to be associated or linked with the results, how will someone who questions the research reproduce or validate it? Most large projects plan for these sorts of issues early on in their life cycle, but for many smaller projects it seems like an afterthought – if thought about at all.

I recently reviewed a paper that was being submitted for publication and the authors, while on target with their overall thesis, supported such with some broad claims whose veracity was supported only by some pictorial charts (units were not displayed). There was no detail regarding the number of times the tests were run to produce the resultant chart. It wasn't explained that the values represented an average over many runs, or what level of variance was represented by the result set. There was no pointer to the raw data set, or detailed test archive, etc. The test that they had run had inherent variability in the source (Internet latency/contention) yet no explanation was given as to how this was accounted for in the published results. Essentially, as a reviewer, I was being asked to sign off on a set of assertions for which I had nothing beyond the credibility of the authors as validation. If I were simply a reader of the publication and held a critical view of the view being presented, I would have no means of learning further or accurately countering the author's claims (assuming that the goal of scientific publication is not only the dissemination of knowledge but the constructive debate of theories leading to a community-refined understanding of reality). Maybe I am naive, but I think we can do better.

I recently attended a workshop and one of the speakers (he was a researcher at Google but I don't remember his name/position) mentioned (almost in passing) during his talk that he and a colleague had been discussing the need for a reality in which every experiment can be reproduced and independently validated at any point in time. He quickly admitted that this was a lofty aspiration and there exist many hurdles that would have to be overcome to facilitate such, but I found myself strongly agreeing with the core sentiment. As a relative newcomer to the scientific community, I've been a bit surprised at the shroud of secrecy that most researchers place around the raw data from their work. There seems to be a prevailing desire for self aggrandizement over fostering collaborative solutions to hard problems. I'm probably somehow missing the boat, but I find myself hoping for a scenario in which data is published early and often – critiqued and validated by others, pointing the community at large towards solutions rather than individuals towards papers.

While thinking about this problem area, I was reminded of Project Trident – an effort by Microsoft Research to solve a similar problem. As I recall, this platform bundles the variables, originating source, and resultant data together in a repository for subsequent validation and archival. I hope that they are successful in this effort and that similar tools are developed in the community. Ideally, the scientific community will embrace the “cloud” for more than simply large scale compute, but also as a means to build a platform such as one referred to earlier in this post whereby any person with interest could browse through existing experiments, and re-execute them with constraints similar to the originals. Then, as the collective imagination grows, the community can experiment with other permutations or derivative works.

Cloud Computing, Theory ,

Speaking at CodeStock

22. June 2009

Join me at CodeStock

I’m privileged to have been given the opportunity to speak at CodeStock (details below) this coming Friday. I’ll be speaking on the topic of Deploying and Packaging SharePoint solutions using TFS. The abstract for my session is:

Have you been using the VS Extensions for SharePoint to create SharePoint packages and found yourself wondering how best to integrate with your source control platform and build system? Consistent packaging of SharePoint solutions can be a challenge and is not for the faint of heart. Come to this session and learn how our team utilizes TFS, Team Build, SandCastle, SharePoint Installer, and STSDev in concert to produce consistent installation packages for our SharePoint/MOSS environment.

CodeStock is about Community. For Developers, by Developers (with love for SysAdmins and DBAs too!). Last year an idea started at CodeStock to mix Open Spaces within a traditional conference. This year we're going to crank things up to 11 and rip off the knob - and you're being drafted to help!

  • Keynote by Microsoft RIA Architect Evangelist Josh Holmes
  • From Developer to Business Owner roundtable with guest Nick Bradbury creator of HomeSite, TopStyle, and FeedDemon
  • 50+ break out sessions + Open Spaces (self-organizing sessions)
  • Grand Prize: VSTS 2008 Team Suite with MSDN Premium
  • Virtual sessions with Jeffery Richter and John Robbins

Space is limited so register today at CodeStock.org

Theory , , , , ,

Isn’t it Time for 64Bit?

3. September 2008

 

 

I’m getting frustrated with application vendors and their support for 64-bit O/Ses. I’d admit that a year or two ago the consumer-level device support for 64-bit O/Ses was a bit weak, but considering I can now walk into Best Buy and pick up a consumer-grade laptop that runs Vista 64, the major software vendors really need to get their act together. In the last few weeks I’ve been bitten by a “we don’t support 64-bit” story a number of times and it feels ridiculous… Maybe i’m an edge case, but every Vista Machine I own is 64-bit (work laptop, home desktop, my wife’s laptop, etc).

  • I was writing code this AM attempting to integrate with QuickBooks 2008 and was forced to go back and re-compile with the processor-specific x86 switch due to the fact that their SDK doesn’t support 64-bit O/S.
  • I had purchased a 3-computer license to ETrust Anti-Virus (Computer Associates) and upon recently replacing my home desktop and my wife’s computer I had to throw away those licenses and replace them with another vendor’s product because CA can’t figure out how to build an AntiVirus app for 64-bit Vista
  • At work, I am forced to run a 32-bit Vista Virtual Machine because the tool provided by our workflow vendor for designing business processes doesn’t run on 64-bit (they claim they’ve been working on a 64-bit version and it should be out “any time”… but that was February…)
  • At work we’ve been working on an electronic records system and are fighting with the vendor because they don’t support 64-bit OS on the server… seriously? An enterprise-scale server-based product that has been around for a few years doesn’t yet support 64-bit? I’m amazed…

Theory ,

Thinking the Cloud…

18. July 2008

I’ve been talking quite a bit with a co-worker about “the cloud” and how organizations can and will leverage it over time, and how application development/design may change as a result. Microsoft’s Sql Server Data Services (SSDS) is only one example of a major paradigm shift in the industry away from internal-only systems to treating certain things as commodity-style resources.

I’ve been thinking through a problem for a non-profit that I work with wherein they needed to share approximately 7.5 GB worth of corporate documents amongst a geographically dispersed team. We’ve been facilitating this by using a WSS site hosted on a little box at my house for the last year or so, but have had increasing frustration with normal home-hosted issues (power blinks, server goes off while I’m out of town, etc.) so I’ve been researching how to solve this problem inexpensively but also well enough to “make the problem go away”. Because of the other discussions my co-worker and I have been having recently, I naturally looked for a “cloud-based” solution.

Here’s the list of things I reviewed:

  • Microsoft Office Live Small Business – http://www.officelive.com – this looked to be a very interesting option… you sign up, get some custom domain mail accounts, a little website if you’d like, and some private space which is essentially a highly-tailored/restricted WSS platform. $15/month for 5GB of space. I contacted their helpline, they assured me I could add to that to meet my 7.5 GB requirement so I started uploading… after 4.9 GB (and a LOT of time – my poor cable modem…) I went to add another 5GB only to have the control panel deny me that option. Another call to customer service and the nice-but-feature-ignorant customer service representative told me “I sure thought you could do that but I guess not”. Cancelled the account and threw away the upload time… oh well… (UPDATE: I’ve since been called back by another rep who assured me that it was, in fact, possible and that all would be well, but the ship has sailed…)
  • Microsoft Office Online (http://www.officeonline.com) – this is the full-blown version of Hosted SharePoint… would have been great however it is currently in beta and has no prices listed. Based on the target audience and the pricing for their hosted live meeting service, my gut tells me it is going to be too pricey for the non-profit to swallow so I moved on…
  • SkyDrive – this would be great… it’s exactly what I needed… but I need 7.5 GB… not 5… I couldn’t find any way (even offering to pay) to get more than 5 GB… on to other options…
  • <Insert your favorite file share here>: Found a bunch of services that might work… some of which I had heard of before, others I wasn’t sure of, some looked too good to be true, some I wasn’t convinced would be around long enough for me to get my data uploaded much less 4-5 years from now…

Then, a friend recommended I look at Amazon S3 and I’m pretty glad he did. Amazon offers “Object Storage” in the cloud for very cheap prices… and expose a series of XML Web Services to interact with the service. It takes almost nothing to get setup, and there’s a number of code samples available on CodePlex (http://codeplex.com) to illustrate working with it. I’m currently playing with a share-ware tool called BucketExplorer (~$50) that works as a file client for the service and, besides being a resource hog, is workign fairly well. The best part about this solution is that it is incredibly cheap ($0.15/GB/Month!) and I can integrate it directly into our existing admin control panel without the staff knowing that the actual data “lives someplace else”. The Internet storage has become a commodity – something that I can just assume is available… pretty slick if you ask me.

image

Cloud Computing, Theory ,

Stop (re-)Inventing the Wheel!

22. June 2008

This is more a personal reminder than anything else…

In my “day job”, I’m working with an organization wherein we are coaching a group of about 80 developers to view opening Visual Studio as their last viable option when looking to solve a problem. This doesn’t mean coding is bad (I certainly hope not… if so, I think I’d be out of a job soon), but rather represents a mind-set that recognizes that we have an enormous collection of functionality/tools already available to us (we are building on top of MOSS 2007) and we need to fully vet the OOTB functionality prior to deciding we need to “roll our own” anything. Directly tied to this approach is the theory that using OOTB functionality and/or configuration of such (rather than raw coding) leads to better long-term maintainability and upgrade-ability, not to mention helping to avoid “hit by a bus” syndrome.

However, sometimes the “preacher” needs to look inwardly and I found myself doing that this weekend. I was working on a project for a non-profit organization I work with, and found myself looking at what I had amassed for solving the problem of site-wide search and was displeased. I immediately reverted to my “code first” tendencies (something I think every developer is born with) and began (mentally) listing the discrepancies with the current solution and designing a “right” solution. Thankfully, prior to actually writing any code, I was kicking around some blog posts and something in one of them (honestly don’t remember what/which) got me thinking of the various “existing” search engines and the fact that they often provide site-specific, nearly OOTB search dialogs that you can embed into your site. I kicked a couple of them around, and settled on one (ended up with the live.com search using the XML web services API), and, rather quickly had a fully-functioning search platform on my site…

The “purist” in me immediately thinks of a couple of reasons why this solution “isn’t as good as what I would have built” (i.e. less control over the actual search results/order, less “immediacy” to updating the index, etc), but then my more realistic side kicks in and I realize that I’m not a search engine expert… not even close… Some might argue as to wether or not those at live.com are either :), but I can guarantee you that they are more so than I, and that the solution “they delivered” is much more accurate and flexible than I would have built…

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I found myself reminding myself to focus on where I can add value, and to leave the rest to others… that’s the only way to consistently deliver adaptable solutions in an environment where the surrounding technology is changing so quickly…

SharePoint, Theory , , ,