RSS

The Bulletin

Algorithmic illusions? 7 myths about 'big data'

Posting in Science

Is 'big data' just the latest example of big hype?

Photo credit: Joe McKendrick

Data analytics is important, and will deliver competitive edge to organizations that employ it. But having massive, multi-petabyte stores of data is no more useful than owning 10 cars: you can only drive one at a time. There are a rising number of voices -- particularly from data experts themselves -- who are cautioning organizations against rushing into the big data scene, as they will only end up disappointed or worse.

Quartz's Christopher Mims, for one, recently made the case that big data -- far from being the revolutionary font of business insights many claim it to be --  is over-hyped, oversold, and essentially useless to many enterprises.

Here are four of the myths about big data Mims uncovered, plus a couple more:

1) Web giants such as Facebook and Yahoo always deal with "big" data for day-to-day analysis: Analysts with these web companies generally can run data analysis on laptops or single servers, and usually don't have to run computational problems across gigantic clusters, Mims points out. He references a recent Microsoft paper that also provides a reality check on the amount of capacity most analytic compute jobs really require: "the majority of real-world analytic jobs process less than 100 GB of input," its authors state.

2) Big data is the gateway to 'data analysis': For effective data analysis, many organizations should be  incorporating "small data" -- targeted data sets that can be very effectively handled within a laptop, Mims states.

3) The more data, the more information: Having too much data to sift through actually may diminish the quality of the information, and even result in fishing expeditions. Mims quotes data scientist Vincent Granville, who cautions that the more data there is, the greater the likelihood of false positives when looking for correlations. As Granville put it: "It’s not hard, even with a data set that includes just 1,000 items, to get into a situation in which 'we are dealing with many, many millions of correlations... out of all these correlations, a few will be extremely high just by chance: if you use such a correlation for predictive modeling, you will lose.”

4) Big data provides precise, indisputable answers: Data science is a science, requiring rigor, review and repeatable research.  And scientific assumptions are always open to challenge. Mims points to the risk that executives not trained in statistical or quantitative methods may be relying on "algorithmic illusions," as expressed by MIT Media Lab visiting scholar Kate Crawford. Data is often flawed and biased.

5) Big data provides information you can bet your business on: If anything, growing reliance on big data analytics is creating a corporate bubble of overconfidence. As Brian Bergstein of MIT Technology Review puts it: "A future in which such 'intuitive knowledge' about how to deploy resources is overruled by algorithms that can work only with hard data and can't, of course, account for the data they don't have ... While it might seem obvious that data, no matter how 'big,' cannot perfectly represent life in all its complexity, information technology produces so much information that it is easy to forget just how much is missing.

6) Big data will turn an organization into a profitable analytics-driven machine: Technology and data alone will not fix a moribund, clueless corporate culture -- in anything, it will exacerbate it.  Just as high-quality film production and editing software is now available to anyone who wants it for a few hundred dollars, don't expect to see thousands of Steven Spielbergs to suddenly emerge -- it takes creativity, verve and keen business sense to pull together a masterful production. Organizations embracing data analytics need to be open to new approaches and ideas, and above all, have a single-minded dedication to what their customers want.  Having the right data on them is only the beginning.

7) Big data is about having massive quantities -- petabytes' worth -- of data: The definition of "Big Data" is far broader than merely massive data stores. I remember some years ago, around the year 2000, hearing about the world's most massive database -- a telecom's 1 terabyte data store on customers and transactions. 1 TB - that was huge!  With this narrow definition, it could be argued that we've always had big data, and always will. What's different now is the data organizations are ingesting is nontraditional and unstructured data -- from machines, from social media, from videos and documents. These aren't easily stored or managed in traditional databases.

— By on May 10, 2013, 2:22 AM PST

Joe McKendrick

Contributing Editor

Joe McKendrick is an independent analyst who tracks the impact of information technology on management and markets. He is a co-author of the SOA Manifesto and has written for Forbes, ZDNet and Database Trends & Applications. He holds a degree from Temple University. He is based in Pennsylvania. Follow him on Twitter. Disclosure