<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace V5 Site Server v5.13.159 (http://www.squarespace.com) on Fri, 24 May 2013 20:41:47 GMT--><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"><title>Technology</title><subtitle>Technology</subtitle><id>http://rob.gillenfamily.net/blog/</id><link rel="alternate" type="application/xhtml+xml" href="http://rob.gillenfamily.net/blog/"/><link rel="self" type="application/atom+xml" href="http://rob.gillenfamily.net/blog/atom.xml"/><updated>2013-02-03T12:35:43Z</updated><generator uri="http://five.squarespace.com/" version="Squarespace V5 Site Server v5.13.159 (http://www.squarespace.com)">Squarespace</generator><entry><title>Big Data 101: Handling Millions of Files</title><category term="BigData"/><category term="Theory"/><id>http://rob.gillenfamily.net/blog/2013/2/2/big-data-101-handling-millions-of-files.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2013/2/2/big-data-101-handling-millions-of-files.html"/><author><name>Rob Gillen</name></author><published>2013-02-02T20:39:52Z</published><updated>2013-02-02T20:39:52Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p>I've been talking a bit recently with members of my team about some of the basic tools that need to be in any data scientist's toolbox. Things that, if you want to lay any claim to working with "big data" should be second nature. Many of these things are not terribly complicated, nor does one have to be overly clever to employ them - however the lack of knowledge as to when to properly apply them could cost you dearly (lost time, lost data, needless system maintenance, etc).</p>
<p>The first such topic came up a week or so ago when one of our younger team members mentioned that his machine fell over after he had written around 3,000,000 files to the same directory. This reminded me of a lesson I learned back in late 2000 when I was working with Microsoft on the "Millon Mailbox March" and the MCIS mail platform. MCIS contained a mail platform designed by Microsoft for the ISP industry (later replaced by Exchange). This mail platform used an interesting approach to store the potentially millions of mailboxes it housed on the file system. Similar approaches can be (and often have been) applied to modern day storage problems within the Big Data space.</p>
<p>So, I came up with the following exercise/challenge for my students and colleagues - I hope you find it interesting. If you've faced a similar problem in the past, you are likly jumping to solutions and know exactly how you would solve it. It will be interesting to see the solutions presented by our team. I'll post any particularly interesting ones here.</p>
<p>Challenge #1: Handling Millions of Files.<br />Design and implement a solution for storing 100,000,000 files on a "normal" file system (NTFS, ext4, etc.). The solution should be tested/verified and should be reasonably balanced. The system should provide a naming convention that ensures against collisions. The solution should function properly on Windows, Mac and Linux. Finally, you need to be able to explain the reasoning behind each design decision implemented in your solution.&nbsp;</p>
<p>Deliverables:</p>
<p>&nbsp;</p>
<ul>
<li>A short writeup defining your approach and any incremental steps along the way. Remember: details around interim "failed" attempts are as important as the final solution.</li>
<li>All code used in your solution. By "All" this means everything necessary to recreate your scenario and solution. This includes any means you used to measure and analyize your results.</li>
<li>Timing of the overall activity is important. For example: how long did it take for you to create the file set and analyze the results? While speed of operation is not the primary goal for this exercise (efficacy is), timing information is always informative</li>
<li>Extra credit is given for striking a clear balance between robustness and simplicity.</li>
</ul>
<p>&nbsp;</p>
<p>Support Files:</p>
<p>&nbsp;</p>
<ul>
<li>This exercise does not require any initial data sets.</li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;Assumptions:</p>
<p>
<ul>
<li>Disk space should not be an issue during this experiment.</li>
<li>The solution both can (and should) assume that the target file system (NTFS/ext4/etc.) is of sufficient size to house the files/data in a contiguous set and single namespace.</li>
</ul>
</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></content></entry><entry><title>Would you like a Cassette for your data?</title><category term="Miscellaneous"/><category term="data"/><id>http://rob.gillenfamily.net/blog/2012/11/1/would-you-like-a-cassette-for-your-data.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/11/1/would-you-like-a-cassette-for-your-data.html"/><author><name>Rob Gillen</name></author><published>2012-11-01T12:07:00Z</published><updated>2012-11-01T12:07:00Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p><span class="full-image-float-right ssNonEditable"><span><img src="http://rob.gillenfamily.net/storage/Tdkc60cassette.jpg?__SQUARESPACE_CACHEVERSION=1351772179314" alt="" /></span></span>A <a class="offsite-link-inline" href="http://it.toolbox.com/people/ebegoli/" target="_blank">colleague of mine</a> sent me a <a class="offsite-link-inline" href="http://www.newscientist.com/article/mg21628875.500-cassette-tapes-are-the-future-of-big-data-storage.html" target="_blank">link to this story</a>&nbsp;about using cassettes for storing data and asked for my thoughts. This was clearly a good-natured jab at me in light of a prior conversation we had debating the appropriateness of tape or disk for a research data storage platform. I had been arguing against tape as an out-moded and inappropriate storage mechanism.</p>
<p>The problem is&hellip; I was <em><strong>wrong</strong></em>.</p>
<p>And so was he.</p>
<p>I am using the word &ldquo;wrong&rdquo; not in the moral sense but to mean &ldquo;less than the ideal&rdquo;, &ldquo;unfortunate&rdquo;, &ldquo;sad&rdquo;, &ldquo;depressing&rdquo;, &lt;enter your own term here&gt;.</p>
<p>You see, I was wrong in that as he argued (and is well articulated in the article) there is simply too much data being generated to make disks a tractable solution. Even with <a class="offsite-link-inline" href="http://www.computerworld.com/s/article/9227382/60TB_disk_drives_could_be_a_reality_in_2016" target="_blank">recent and projected growth in hard drives</a>&nbsp;(60TB expected by 2016) our ability to produce data &ndash; particularly in an automated means via sensors and scientific instruments &ndash; already does, and will continue to out-pace these advancements. Even if hard drives were able to keep up with the space demands, the power required to keep those drives running quickly becomes prohibitive. And let&rsquo;s not even talk about disk transfer rate issues.</p>
<p>Whether or not <em>he</em> was &ldquo;wrong&rdquo; is probably a bit more subjective (an admission I&rsquo;m certain he&rsquo;ll enjoy). My frustration with his position is that it tends to be synonymous with &ldquo;slow&rdquo; or &ldquo;laborious&rdquo;. That said, an unfortunate reality in the digital sciences is that with data sets of any significant size, a researcher often has to plan well in advance of his experiment to stage the data. Data has to be loaded into online storage from some cold storage mechanism (often a tape library). This significantly limits one&rsquo;s curiosity and causes some questions to go unanswered (i.e. &ldquo;I wonder if&hellip; well, it&rsquo;s probably not worth the time/effort to load up the data just to see&hellip;&rdquo;). &nbsp;I suppose that if I&rsquo;m honest, I&rsquo;m simply manifesting the &ldquo;Google effect&rdquo; &ndash; the expectation that I can as a question of tons of data and get an answer instantly. I&rsquo;d love for this to be possible of all scientific data &ndash; but as any data scientist will tell you, that desire is simply na&iuml;ve. Providing platforms such as Google&rsquo;s is hard work, and not achieved without significant planning and effort. Admitting this still doesn&rsquo;t mean I can&rsquo;t hope for it&hellip;</p>
<p>The real nugget buried in the article and underlying our friendly debate, is that storage technologies are nowhere close to where we need them to be. No matter which option we choose it will be a compromise between a.) discarding data &ndash; unfortunate no matter how you look at it, b.) spending astronomical amounts of money on both hardware and power, or&nbsp; c.) using slow, offline, and deterioration-prone devices such as tapes. Frankly, none of these options are attractive. There is a little part of me that dies when I think of a researcher having to choose to <em>discard</em> data simply because he doesn&rsquo;t have space to store it and doesn&rsquo;t <em>currently think</em> it is important to his work. What if he&rsquo;s wrong? What if that data is (or was) the key to solving part of the problem, he just didn&rsquo;t know it yet? Or maybe it is the key to solving a problem he doesn&rsquo;t yet know he has&hellip;</p>
<p>So here&rsquo;s hoping that the researchers working on storage technologies will be successful. That they will develop means and methods for us to store massive amounts of data, access it in increasingly shorter times, and with a power envelope that is reasonable and sustainable. &nbsp;That&rsquo;s not asking too much, is it?</p>]]></content></entry><entry><title>Debugging and Reversing Basics 0.01</title><category term="Security"/><category term="grayhat"/><category term="security"/><category term="windows"/><id>http://rob.gillenfamily.net/blog/2012/10/15/debugging-and-reversing-basics-001.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/10/15/debugging-and-reversing-basics-001.html"/><author><name>Rob Gillen</name></author><published>2012-10-15T17:54:52Z</published><updated>2012-10-15T17:54:52Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p><span style="font-size: 90%;"><em>[Note: the title is what it is because I consider myself a n00b at this and these are likely things anyone else already knows]</em></span></p>
<p>I'm always trying to learn more and recently topics in the infosec world have garnered my attention. To further my understanding of the space, I've been reading a bit and this weekend was reading part of <a class="offsite-link-inline" href="http://www.amazon.com/gp/product/B007V2DNEK/ref=as_li_ss_tl?ie=UTF8&amp;camp=1789&amp;creative=390957&amp;creativeASIN=B007V2DNEK&amp;linkCode=as2&amp;tag=robgillenblog-20" target="_blank">Gray Hat Python: Python Programming for Hackers and Reverse Engineers</a> and found myself purpsefully doing that which you shouldn't do when learning a new subject: <strong><em>not following the instructions</em></strong>. You see, the author specifically indicates that the samples were written on/tested on a Windows x86 machine and his assumption is that you will be running on the same. In my case, I haven't run a 32-bit OS in years (since Vista was released) and I made two assumptions: 1.) It probably doesn't matter that much and 2.) even if it does, it's probably a good thing to learn what the differences b/t 32 and 64 bit debugging/reversing are. Well, after a few hours of playing around, I can tell you the first assumption was flat wrong and the second is probably accurate.</p>
<p>The fun beings in chapter 3 where you build a simple debugger. I got stuck on the very first step which was a simple demonstration of attaching to an existing process (calc.exe). I would run the script and simply get an error: "[*] Unable to attach to the process." I figured I must have done something wrong, so I poked around a bit and even diff'd my code against the reference and still didn't see any important differences. <em>As a side note: If you've not yet <a class="offsite-link-inline" href="http://nostarch.com/index.php?q=ghpython.htm#updates" target="_blank">looked at the errata for the book</a>, <strong>you need to do so</strong>. There are a number of code/bug fixes that are required to get things working.</em></p>
<p>The key came from a <a class="offsite-link-inline" href="http://wordgems.wordpress.com/2010/12/18/gray-hat-python/" target="_blank">blog post I came across written by A. H.</a> where he hints that the problem may have to do with the architecture of the application I am attempting to attach to. He suggests adjusting the error line in the script as follows:
<pre>print "[*] Unable to attach to the process. %s" % FormatError(kernel32.GetLastError())</pre>
and if the error ends with "The request is not supported" you can rest sure that your problem is a 32/64 bit issue.&nbsp;Unfortunately, A.H.'s solution was to simply use a 32-bit box for the rest of the testing.</p>
<h4>What Version of Python am I running?</h4>
<p>The next issue that occured to me was to determine which version of Python (bit-ness) I was running. I simple search brought up <a class="offsite-link-inline" href="http://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-is-executing-in-32bit-or-64bit-mode" target="_blank">Ned Deily's answer on Stack Overflow</a> which indicated that one simple way to check would be to run the following:</p>
<pre>python -c "import struct;print( 8 * struct.calcsize('P'))"</pre>
<p>You will get either 32 or 64 as a result - in my case, 32. Great. So I know that my debugging thread is a 32-bit application, what is the image type of calc.exe?</p>
<h4>DumpBin</h4>
<p>Some poking around led me to an article by <a class="offsite-link-inline" href="http://blogs.technet.com/b/windowshpc/archive/2009/03/27/how-to-tell-if-a-exe-file-is-a-32-bit-or-64-bit-application-using-dumpbin.aspx" target="_blank">Frank Chism on the&nbsp;Windows HPC&nbsp;blog</a> that pointed to being able to run a tool called dumpbin to see if an exe was 32 or 64 bit. I followed the instructions on his post, opened a VS 2010-enabled command shell and typed the following:</p>
<pre>dumpbin /headers c:\Windows\system32\calc.exe|findstr "magic machine"</pre>
<p>Which resulted in:<br />
<pre>14C machine (x86)<br />&nbsp;&nbsp;&nbsp; 32 bit word machine<br />10B magic # (PE32)</pre>
</p>
<p>Ok... so my debugging thread is 32-bit, and the executeable that I'm running is 32-bit, so why am I unable to attach to the thread?</p>
<h4>Process Explorer</h4>
<p>At this point, I pulled up the trusty Sys Internals Process Explorer to see if it would shed any light on the issue. From within Process Monitor, if you select the View menu and then click on "Select Columns" you can tick the box for "Image Type" which will allow you to see for each process/executeable running what the image type is. And, after quickly checking, I see that calc.exe is running as a 64-bit image. How in the world is this happening?</p>
<h4>WOW64</h4>
<p>64 Bit Windows has a feature called the <a class="offsite-link-inline" href="http://msdn.microsoft.com/en-us/library/aa384187%28VS.85%29.aspx" target="_blank">File System Redirector</a> which seems to be the root of my issues. If I understand how this works (dubious), this is a&nbsp;layer within the OS that "magically" redirects you to the proper version of the application based on the calling process. For example, if a 64-bit process attempts to&nbsp;open the 64-bit image of calc.exe (located in&nbsp;C:\Windows\System32), it will work just fine. However, if a 32-bit process attempts to do the same thing, it will get magically re-directed to the 32-bit version of the application&nbsp;which is located in C:\Windows\SysWOW64 (don't even ask&nbsp;why the folders are named the way they are based on the versions of the applications that they house). What this means, is that if you simply hit Windows+R and type calc, you are calling it from a 64-bit process (the shell) and therefore you get the 64-bit version of the application. If, however, you reference calc.exe from a 32-bit process (i.e. dumpbin), you get&nbsp;redirected to the 32-bit version.</p>
<p>If you specifically need the 32-bit version (as I did to complete my testing), you can open a command prompt, navigate to c:\Windows\SysWOW64 and then type calc.exe or, you can have it launched from any 32-bit process. To see this second option in action, open a command prompt, navigate to c:\windows\SysWOW64 and then type cmd.exe. Via Process Explorer you can confirm that you are running a 32-bit version of cmd.exe. Then navigate wherever you'd like (i.e. c:\) and then type calc.exe. You will now get the 32-bit version of the application (and can confirm it in Process Explorer).</p>
<p>From here, I can attach to the process (calc.exe as 32-bit) from my python code. This moves me forward a bit but doesn't solve the "<em>how do I bind to the 64-bit image</em>" question. That will be a problem for another day.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></content></entry><entry><title>DevLink: Wireless Network Security</title><category term="Conferences"/><category term="devlink"/><category term="security"/><category term="wireless"/><id>http://rob.gillenfamily.net/blog/2012/8/31/devlink-wireless-network-security.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/8/31/devlink-wireless-network-security.html"/><author><name>Rob Gillen</name></author><published>2012-08-31T13:12:29Z</published><updated>2012-08-31T13:12:29Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p>On Wednesday I had the privelege of speaking at DevLink on the topic of wireless network security. I had a great time giving the talk and had great audience participation (including some who were unknowning victims to my man-in-the-middle attack). The slides from the talk are posted below.</p>
<p>&nbsp;</p>
<iframe src="http://www.slideshare.net/slideshow/embed_code/14129277?rel=0" width="597" height="486" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="http://www.slideshare.net/rgillen/devlink-wifu-you-think-your-wireless-is-secure" title="DevLink - WiFu: You think your wireless is secure?" target="_blank">DevLink - WiFu: You think your wireless is secure?</a> </strong> from <strong><a href="http://www.slideshare.net/rgillen" target="_blank">Rob Gillen</a></strong> </div>]]></content></entry><entry><title>Windows 8 Release Preview on Samsung Slate</title><category term="Miscellaneous"/><category term="windows8"/><id>http://rob.gillenfamily.net/blog/2012/7/11/windows-8-release-preview-on-samsung-slate.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/7/11/windows-8-release-preview-on-samsung-slate.html"/><author><name>Rob Gillen</name></author><published>2012-07-11T00:31:39Z</published><updated>2012-07-11T00:31:39Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p>I've been playing with Windows 8 on a Samsung 700T1A slate for a number of months and was quite excited with the Release Preview was announced and attempted to install it straight away. Unfortunately, I was unable to get it installed right away and set it aside for awhile, trying occasionally, failing, and setting it aside.</p>
<p>The problem I was having, was that the slate wouldn't boot to the Windows 8 media - DVD, USB, no matter what I burned it to, it wouldn't work. I even verified that the media was valid by using it to install on other machines. </p>
<p>Tonight, I finally got it working and the problem was both so odd, and simple, that I thought I'd post it here to maybe help someone else who comes along searching for the same problem. </p>
<p>It seems that the slate, when shipped, has a bios setting that has "Support for Legacy USB" devices enabled. However, as soon as the system is updated with a purchase date, it automatically flips this switch (presumably for faster boots). Unfortunately, this also causes it to not check for bootable USB devices during POST (cf. <a href="http://skp.samsungcsportal.com/integrated/popup/FaqDetailPopup3.jsp?cdsite=hk_en&amp;seq=431318">http://skp.samsungcsportal.com/integrated/popup/FaqDetailPopup3.jsp?cdsite=hk_en&amp;seq=431318</a>)</p>
<p>The post that tipped me off was this: <a href="http://skp.samsungcsportal.com/integrated/popup/FaqDetailPopup3.jsp?cdsite=hk_en&amp;seq=431320">http://skp.samsungcsportal.com/integrated/popup/FaqDetailPopup3.jsp?cdsite=hk_en&amp;seq=431320</a></p>
<p><img title="bios.jpeg" src="http://rob.gillenfamily.net/resource/bios.jpeg?fileId=19212792" alt="Bios" width="415" height="181" border="0" /></p>
<p>(image courtesy of Samsung)</p>
<p>However, the bios on my slate didn't look like this - there is no Fast BIOS Mode menu item. However, there is a menu item that says "Support Legacy USB Devices". Based on <a href="http://skp.samsungcsportal.com/integrated/popup/FaqDetailPopup3.jsp?cdsite=hk_en&amp;seq=431320">this article</a> and the <a href="http://skp.samsungcsportal.com/integrated/popup/FaqDetailPopup3.jsp?cdsite=hk_en&amp;seq=431318">previous one</a>, I took a guess that changing this would fix it and, magically, everything worked just as you would have expected.</p>]]></content></entry><entry><title>CodeStock 2012: Buffer Overflow Attack</title><category term="Conferences"/><category term="Security"/><id>http://rob.gillenfamily.net/blog/2012/6/18/codestock-2012-buffer-overflow-attack.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/6/18/codestock-2012-buffer-overflow-attack.html"/><author><name>Rob Gillen</name></author><published>2012-06-18T01:29:03Z</published><updated>2012-06-18T01:29:03Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p>We had a great time at CodeStock a few days ago discussing buffer overflow attacks, showing developers how they are discovered and exploited and a bit about how to avoid creating software that is vulnerable to these types of attacks. Below are the slides and video from the session:</p>  <br /><script async class="speakerdeck-embed" data-id="4fde1234dbe56c002200abc8" data-ratio="1.3333333333333333" src="http://rob.gillenfamily.net//speakerdeck.com/assets/embed.js"></script>  <br />  <br />  <br /><iframe height="281" src="http://player.vimeo.com/video/44201782" frameborder="0" width="500" mozallowfullscreen="mozallowfullscreen" webkitallowfullscreen="webkitallowfullscreen" allowfullscreen="allowfullscreen"></iframe>]]></content></entry><entry><title>CodeStock 2012: You Think Your WiFi is Safe?</title><category term="Conferences"/><category term="Security"/><id>http://rob.gillenfamily.net/blog/2012/6/18/codestock-2012-you-think-your-wifi-is-safe.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/6/18/codestock-2012-you-think-your-wifi-is-safe.html"/><author><name>Rob Gillen</name></author><published>2012-06-18T01:09:39Z</published><updated>2012-06-18T01:09:39Z</updated><content type="html" xml:lang="en-US"><![CDATA[<p>This past Friday I had the privilege of speaking on WiFi security at CodeStock 2012. I had a blast both preparing for the talk and delivering it and I hope it was beneficial to some of those who attended.</p>  <p>As promised (although a bit late), the following are the slides and video from the session:</p> <script async class="speakerdeck-embed" data-id="4fde14acb0559d01500043f1" data-ratio="1.3333333333333333" src="http://rob.gillenfamily.net//speakerdeck.com/assets/embed.js"></script>  <br />  <br /><iframe height="281" src="http://player.vimeo.com/video/44177323" frameborder="0" width="500" mozallowfullscreen="mozallowfullscreen" webkitallowfullscreen="webkitallowfullscreen" allowfullscreen="allowfullscreen"></iframe>]]></content></entry><entry><title>Manually Interacting with the MSF database</title><category term="Security"/><id>http://rob.gillenfamily.net/blog/2012/4/16/manually-interacting-with-the-msf-database.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/4/16/manually-interacting-with-the-msf-database.html"/><author><name>Rob Gillen</name></author><published>2012-04-16T20:39:20Z</published><updated>2012-04-16T20:39:20Z</updated><summary type="html" xml:lang="en-US"><![CDATA[<p></p>]]></summary></entry><entry><title>Building My Personal Cloud</title><category term="Cloud Computing"/><category term="cloud"/><id>http://rob.gillenfamily.net/blog/2012/4/12/building-my-personal-cloud.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/4/12/building-my-personal-cloud.html"/><author><name>Rob Gillen</name></author><published>2012-04-12T09:01:48Z</published><updated>2012-04-12T09:01:48Z</updated><summary type="html" xml:lang="en-US"><![CDATA[<p></p>]]></summary></entry><entry><title>Speaking at CodeStock 2012</title><category term="Conferences"/><category term="codestock"/><category term="security"/><id>http://rob.gillenfamily.net/blog/2012/4/9/speaking-at-codestock-2012.html</id><link rel="alternate" type="text/html" href="http://rob.gillenfamily.net/blog/2012/4/9/speaking-at-codestock-2012.html"/><author><name>Rob Gillen</name></author><published>2012-04-09T21:09:29Z</published><updated>2012-04-09T21:09:29Z</updated><summary type="html" xml:lang="en-US"><![CDATA[<p></p>]]></summary></entry></feed>