[credit: The Economist]

A lot is the answer as noted above in The Economist. They had a great special report a couple of weeks back on the enormous amounts of data we’re now creating in the digital age. For someone who remembers 5 1/4 inch floppy disks and loading computer games of an audio tape, the table above is a sobering read.

Of the articles in the special segment, I enjoyed Data, data everywhere and Clicking for gold. The former gives a real sense of how big the data challenge will become whilst the latter is about how the so called exhaust of information gleaned by websites is becoming incredibly powerful.

A few startling stats I gleaned:

  • by 2013 the amount of traffic flowing over the internet annually will reach 667 Exabyte's, according to Cisco.
  • Wal Mart handles more than 1m transactions per day feeding in to a database of 2.5 petabytes
  • The Large Hadron Collider at CERN generates 40 terabytes per second. It’s more than can be captured so scientists capture what they can and throw away the rest.
  • UCSD calculated that an individual is bombarded with 34 gigabytes per day (they should come to my house)
  • They also found that the average US household gets hit with 3.6 zettabytes (per year I think)

There is so much good stuff in this report. If you step back and think how much digital data you create yourself, it’s a just a huge, huge volume. Websites you visit, documents you read/write, photos, videos, blog posts…then think of all your friends doing that and every business doing that.

Tim O’Reilly is insightful as ever with his comment that for Internet companies, “data are the coin of the realm”. Amazon, Google, eBay, Microsoft, Facebook are all big players in this new era that will demand massive skills and massive computing to do realtime analysis and serve up just what you didn’t know you wanted, right when you want it :)

Cloud computing has a part to play here for sure in terms of number crunching capability (I see big scope for HPC type workloads on Windows Azure…stuff like Monte Carlo simulations) and storage. Getting that data in to the cloud may be the bigger problem and as the late Jim Gray was fond or reminding us, a bunch of disks and FedEx are a pretty good solution to that problem!

Anyway…stop reading me and go read the report. I’m just generating exhaust here and most of it’s noxious :)