Follow this blog:
RSS

3 reasons why ‘big data’ can often be meaningless or misleading

By | February 26, 2013, 9:06 AM PST

Big data analytics promise to bring great advances to the way we do business. However, observers caution there are risks in making data-based decisions without proper context, or by over-relying on algorithms or by cherry-picking data.

A few years back, I was doing some research on service oriented architecture, or SOA, for clients. SOA — in which essential components of applications are broken down into reusable services — is the foundation and forerunner to today’s cloud computing. Making an inquiry with Google Trends, I found that interest in SOA was most prevalent in The Netherlands, and Dutch cities scored highest of any city in the world — far surpassing places such as San Francisco and San Jose, the heart of Silicon Valley.

Hmm, I pondered — why is interest in this new computing model so high in The Netherlands? They must be really far ahead of the technology and innovation curve. (The Dutch are extremely industrious, after all.) Perhaps there are some companies and individuals really pushing the technology envelope there?  Is it geographic, perhaps because The Netherlands are at a crossroads point in Europe?

I soon found out that in The Netherlands, ‘SOA’ are the initials for “seksueel overdraagbare aandoening,” or “sexually transmissible disease.”

The lesson from my simple search exercise is that big data is essentially meaningless — or potentially misleading — without proper context. A global search of a term or concept without translation and cultural context can go seriously wrong.  In a recent post, Nick Bilton of The New York Times also cautioned against reading too much into Google Flu Trends data, which attempted to track the progression of the virus via an algorithm that tracked mentions of the flu. Bilton quotes Nature’s Declan Butler:

“’Several researchers suggest that the problems may be due to widespread media coverage of this year’s severe U.S. flu season,’ Declan Butler wrote in Nature. Then add social media, which helped news of the flu spread quicker than the virus itself. In other words, Google’s algorithm was looking only at the numbers, not at the context of the search results.”

Context is one important element of big data that needs to be better understood. Over-reliance on big data analytics is a second peril businesses and society are creating.  In another NY Times, post, Steve Lohr points to big data as a means to better allocate government resources and understand patterns within society. But, he cautions, relying on algorithms has its own form of risk, since they are “created by people and they contain inferences and assumptions coded in. Those coded-in values shape the output — computer-generated predictions, recommendations and simulations.”

Over-reliance on algorithms could lead decision makers down the wrong path. Brian Bergstein of MIT Technology Review suggests that growing reliance on big data analytics is even creating a corporate bubble of overconfidence.

[He] fears a future in which such “intuitive knowledge” about how to deploy resources is overruled by algorithms that can work only with hard data and can’t, of course, account for the data they don’t have … While it might seem obvious that data, no matter how “big,” cannot perfectly represent life in all its complexity, information technology produces so much information that it is easy to forget just how much is missing.

History is full of examples of the incomplete pictures data provides, versus human observations on the ground. The U.S. overreliance on data during the 1959-1975 Vietnam War is a classic example, Bergstein pointed out.

Cherry-picking data is a third area of risk that comes with big data analytics. With abundant information flowing in from so many sources, there is also a potential issue in relying on incomplete or misdirected results. Nassim Taleb cautions in an article in Wired that researchers and analysts working with big data run the risk of cherry-picking information:

“Big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal).”

In other words, big data analytics can find you the results you want, versus real-life situations.

Big data offers a lot of insights and opportunities that could never have been dreamed of before. But its users must carefully weight what it tells them, and still keep human intelligence in charge of the effort.

(Photo credit: Joe McKendrick.)

Start your week smarter with our weekly e-mail newsletter. It's your cheat sheet for good ideas. Get it.

Joe McKendrick

About Joe McKendrick

Joe McKendrick is a contributing editor for SmartPlanet.

Joe McKendrick

Joe McKendrick

Contributing Editor

Joe McKendrick is an independent analyst who tracks the impact of information technology on management and markets. He is the author of the SOA Manifesto and has written for Forbes, ZDNet and Database Trends & Applications. He holds a degree from Temple University. He is based in Pennsylvania.

Follow him on Twitter.

Joe McKendrick

Joe McKendrick

Joe McKendrick is an independent consultant and editor. Joe has performed project work for the following companies in the IT marketspace: IBM, Systinet/HP, Teradata. He has performed project work for the following organizations in partnership with Unisphere Research (Unisphere Media): IBM, Oracle Corp., International Oracle Users Group, Oracle Applications Users Group, Professional Association for SQL Server, International DB2 Users Group, International Sybase Users Group.

He writes for SmartPlanet and is not an employee of CBS.

If you liked this, don't miss...
4
Comments

Join the conversation!

Follow via:
RSS
0 Votes
+ -
Don't Worry About Overdependence on Big Data
Fear of big data is overblown because organizations are only slowly learning how to better take advantage of data. Based on ESG's recent survey on 2013 IT spending intentions, "improving analytics" only tied for 5th in terms of business initiatives driving IT spending. The big data "revolution" thus, for the majority of organizations, is more about taking the next step, not a giant leap. Why so slow? Few companies possess the internal IT, analytics and data governance expertise, and 3rd party experts are priced at a premium - and at only 2% IT growth rates projected for 2013 in USA and Europe, who has the budget to bet big? Also, the supply side, despite hype and promises, are still struggling with how to make big data easier. The other issue, besides lack of expertise, is culture, process and politics - who owns curating, distributing and applying the "insights?" Finally, not many executives will get away with making a poor tactical or strategic decision and blaming it on "bad big data!" Big data isn't just about IT, or analytics, it isn't a plug in solution like an accounting package or an Android phone - it actually impacts organizational thinking, which usually requires many years for a makeover.
Posted by evanquinn
Updated - 27th Feb
0 Votes
+ -
Marketers Can Be
Hey Joe - Good insight. As a marketer, I agree that you can find you the results you want if you slice the data in the "right" way - Ever read the book, "How to Lie With Statistics?" However, perhaps I'm a romantic to believe that strong ethics still apply and that big data analytics will deliver the real-life situations required to make truly informed decisions. Crazy talk! My bigger worry is how do businesses protect the massive data that is being pulled together? The breaches keep coming, and I value the companies that value my personal data more than I worry if it is misinterpreted. Am I crazy? @socialtis
Posted by SocialTIS
27th Feb
0 Votes
+ -
lying or misinterpretation?
I too have read those books and others. But I suspect most mistakes are cases of misinterpretation or mere wishful thinking. For any research activity, if you don't know what you are looking for, it is hard to decide when you've found it.

Correlations occur 'naturally' (coincidence happens)
Correlations may indicate a third hidden variable
Correlations may work backwards from the way they appear. (The social example is that folks say bad kings led to bad economic times but it could be the other way around, bad economic times forcing the king to 'be' a bad monarch.)

Teasing out cause vs effect from mere correlation is not easy.
Posted by minstrelmike@...
27th Feb
0 Votes
+ -
flu or norovirus
I agree that data is hard to understand. And applications that make it easier to use will provide far more confusion than insight (as usual).

For example, I would take issue with Google over-reporting flu. Perhaps that happened. My take on the issue is that both flu AND norovirus were spreading together. The CDC and various hospitals will classify them as separate diseases. People who actually have one or the other disease will google 'flu' instead of norovirus.

I dunno if that scenario is true or not but I suspect a bit more analysis could prove or disprove it easily enough.

Interpretation of statistics is not easy wink
Posted by minstrelmike@...
27th Feb
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the SmartPlanet community and join the conversation! Signing up is fast and free. Don't wait -- we want to hear your opinion!