Monday, June 10, 2013

A Note on PRISM and IOPS

I have several posts half-written on the subject of the NSA's data-gathering efforts, but let me start with a few words on which things in the computing world are and are not subject to "Moore's Law".

Let's first say that the real Moore's Law is a narrow statement about the rate of increase in the number of transistors per integrated circuit, but in its colluqial usage means something like "computing power per unit cost increases exponentially, doubling about once every 18-24 months" That is, if I buy $1 million worth of computing power today, and then 2 years later spend another (inflation-adjusted) $1 million on computing power, the newer computer will be roughly twice as "powerful" as the older one, for some definition of "powerful".

Moore's law cannot make this platter spin any faster.
Now let's talk about the fact that certain things happening inside your computer are physical processes that researchers have had a hard time getting onto the Moore's Law growth curve. Most importantly, large hard drives consist of platters that must be spun at higher and higher speeds in order to increase disk read & write performance.

Network bandwidth appears to be on a Moore's law-style curve, but the rate of growth is somewhat slower than, say, the growth in the amount of RAM you get for $100 or the amount of CPU power you get for $1000.

These facts turn out to be important because the amount of data supposedly being collected is large enough that it has to be kept on hard drives, as opposed to in RAM or even on solid-state drives. This has important consquences for our ability to do meaningful research on the raw data. But we'll cover that later.

4 comments:

Greg Hao said...

nick, you work for AMZN right? What do you think about projects on the larger scale. I know Moore's law talks about transistors but we're extrapolating it to "component" level but with the rise of non relationship DBs like hadoop, surely we're getting to the point where individual hard drive sizes aren't that much of an issue anymore in terms of growth and capacity. Hell, look at what amzn is doing with aws.

Nick Beaudrot said...

I used to work there, not anymore.

There are still certain operations that are time consuming on extremely large data sets, no matter what you use (big iron databases, Hadoop/MapReduce type infrastructure) etc. I'll try to go into this in some more detail in the coming days.

Greg Hao said...

Obviously I don't want to extrapolate too much and get away from the actual Moore's Law but when you have guys at Google who're talking about treating entire datacenters as a single unit, what was impossible a mere decade ago is now simply time consuming.

Greg Hao said...

Not directly related to what we're talking about but I found this brief blurb about the NSA and its role in nurturing massively scaled open source projects interesting: http://tomslee.net/2013/06/free-software-and-surveillance.html