Follow this blog:
RSS

Unstructured data ‘out of control’: survey

By | June 6, 2011, 8:01 AM PDT

Many organizations are becoming overwhelmed with the volumes of unstructured information — audio, video, graphics, social media messages — that falls outside the purview of their “traditional” databases. Organizations that do get their arms around this data will gain significant competitive edge.

As part of my work with Unisphere Research, a division of Information Today, Inc., I helped conduct a new survey that finds unstructured data is growing at a faster clip than relational data — driving the “Big Data” explosion. Thirty-five percent of respondents say unstructured information has already surpassed or will surpass the volume of traditional relational data in the next 36 months. Sixty-two percent say this is inevitable within the next decade. The survey gathered input from 446 data managers and professionals who are readers of Database Trends and Applications magazine, and was underwritten by MarkLogic.

A majority of survey respondents acknowledge that unstructured information is growing out of control and is driving the big data explosion – 91% say unstructured information already lives in their organizations, but many aren’t sure what to do about it.

A segment of companies, 16%, have made unstructured data part of their actual business offerings.

There is growing concern across the business technology landscape about organizations’ inability to effectively tap these new resources. Last month, estimates were released that show that this year, the Digital Universe — meaning every electronically stored piece of data or file out there — will reach 1.2 million petabytes, or 1.2 zettabytes, this year. That’s up from a measly 800,000 petabytes in 2009. In a recent interview with MIT’s Sloan Management Review, K. Ananth Krishnan of Tata Consultancy Services described what’s at stake for businesses that fail to leverage their growing unstructured data stores:

“We are only looking at what we have in our data warehouses, it’s not going to be enough for us to get the insights that we need. If you’re a retailer and you were not using all the information you could to judge your customers’ buying patterns, then the retailer across the street probably will, and they’ll steal your customers. That’s the realization, I think, that drove a lot of people to think that they should be capturing much, much more.”

In terms of technologies and governance, organizations don’t feel they’re ready for all this data. Many companies don’t understand how to handle unstructured data and throw old technologies at the problem. With relational databases, companies are attempting to use 30 year-old technology to try to tackle today’s information challenges.

The Unisphere/MarkLogic survey also found that 86% of respondents admit that unstructured data is important to their organization, yet only 11% have clear procedures and policies for managing unstructured data in place. In addition, 80% of respondents know the amount of unstructured data will rise in the next three years, but only 24% of respondents believe their current infrastructure will be able to adequately manage it.

We still have a long way to go, Krishnan says in the Sloan review article. Technology that can grasp and pull insights out of this variety of data is still on the cutting edge:

“There are still loads of things that we can’t do. There is a whole aspect of computing which PhD students are working on, which is basically trying to understand text. Understand sentiment. A five-year-old child can say in 30 seconds whether Mom or Dad is angry, or happy, or whatever. Sense the mood in the room. A computer program still has a hard time figuring that out…. The analysis of text, the analysis of video, the analysis of audio — it works a lot better in James Bond movies. In real life, it is extremely hard from a fundamental computer science perspective to understand all that information.”

Management awareness of the existence of this data, let alone how it can benefit the business, is where we need to start. The Unisphere/MarkLogic survey finds 40% of managers are unaware of the extent unstructured data exists in their organization, and only 45% of organizations are moderately or strongly committed to leveraging this resource, creating a competitive advantage for those companies that do.

Even organizations with higher concentrations of unstructured information face issues with corporate awareness regarding the existence of this data.  They devote most of their resources to managing the smaller portion of their data, while moving unstructured information into special-purpose databases or content management systems.

Start your week smarter with our weekly e-mail newsletter. It's your cheat sheet for good ideas. Get it.

Joe McKendrick

About Joe McKendrick

Joe McKendrick is a contributing editor for SmartPlanet.

Joe McKendrick

Joe McKendrick

Contributing Editor, Business

Joe McKendrick is an independent analyst who tracks the impact of information technology on management and markets. He is the author of the SOA Manifesto and has written for Forbes, ZDNet and Database Trends & Applications. He holds a degree from Temple University. He is based in Pennsylvania.

Follow him on Twitter.

Joe McKendrick

Joe McKendrick

Joe McKendrick is an independent consultant and editor. Joe has performed project work for the following companies in the IT marketspace: IBM, Systinet/HP, Teradata. He has performed project work for the following organizations in partnership with Unisphere Research (Unisphere Media): IBM, Oracle Corp., International Oracle Users Group, Oracle Applications Users Group, Professional Association for SQL Server, International DB2 Users Group, International Sybase Users Group.

He writes for SmartPlanet and is not an employee of CBS.

If you liked this, don't miss...
7
Comments

Join the conversation!

Follow via:
RSS
0 Votes
+ -
Nothing is Un-structured - just mis-categorized!
All this means is that the pepople "in charge" fail to actually be in charge and use the same existing techniques that they had used in the past to organize the different FILE EXTENSIONS that are now in use.

The databases are all there - both general SQL / Oracle - to just store the data (which actually means you have to write a COMPLETE applciation to both store and retrieve the data) or use existing applications like Extensis Portfolio (graphics, video etc) SharePoint (ugh!) which have built in search features to find the data after storing.

What is really needed by any org is a standard lexigoly of classifying all the incoming data types so they can be stored and found again.

And most of that is already there in Dublin Core, EXIF data, etc - it is just picking out all those and then ensuring everything is just properly cataloged as it comes in.

getting the 15 different gropus that control each different part of the whole coporate IT system is the REAL PROBLEM!
Posted by TAPhilo
6th Jun 2011
0 Votes
+ -
An option
Found this company a few months ago. Startup, but has promising tech to help with issues like this. Our organization is not immune to rapid data growth and we had to find some solutions.

diginomeinc.com

Before something like Diginome, the first thing that must be done, as suggested by TAPhilo, is the need to approach the data landscape as a Master Data Management problem. Key to this is setting up a real, separate body within the organization focused solely on governance. This is critical. Yeah, this is a pain to institute, but it beats the heck out of dealing with mountains of data, the origins and veracity of which are unknown.

This problem will get much worse, and quickly.
Posted by Lucky2BHere
6th Jun 2011
0 Votes
+ -
Unstructured Amount, Mangement, Discovery and Integration
Good report.

Manageing Unstructured data is a huge problem, but as you point out can lead to huge rewards when done properly and with purpose.

Many people are still trying to manage their structured data, let alone their unstructured data.

The typical growth has always been Structured first then unstructured, but because of Social Media there is a huge attempt to leap frog the structured data analysis -- this has pro's and con's.

The positive
this technique allows people to discover information and leap frog their competitors, but it must be done in conjunction with the continued management improvement of your structured data. The unstructured can really help with this, consider validating your customer records or extracting address information from the unstructured data.

A company can better integrate their unstructured with structured because there is no legacy structures that need to be integrated with, a customer can build the integration from the ground up

The Con
Many times people do analysis only because it is trendy, with no real objective in mind. This methodology has lead to the demise of many projects, not only text but almost any activity.

Concluding
As a vendor, I work for SAS, it is our job to make Text Analysis easier allowing companies to add unstructured data to their process or to leap frog the standard progression and use structured and unstructured data together even before they have their structured data in place.

Analysis of Text is not an easy thing to do, but it is critical in the information age to be an effective customer driven organization.
Posted by rafoley@...
8th Jun 2011
0 Votes
+ -
I do not think they are talking datbases guys.
The story seems to relate to those gigabytes of annoying home pictures and iPod tunes that people seem to think they are entitled to store in their network share at work. (Why do so many problems come back to fighting an entitlement mentality?)

Simple policies limiting storage space per user and file types to be stored will block most of the junk from getting onto your servers.

The problem is many companies never setup their network security properly and then spend needless hours doing cleanup when disk space runs out. Disk space is cheap, but it is still more costly than proper enforcement.

Our per user space limits are 50 mb for average users and 300 mb for administrative staff which is plenty of space for business related documents. Our marketing department is allowed gigs of space because of the graphic files they must work with.

My staff has free reign to delete any non-work related materials they find that somehow gets around our security measures and onto our servers. We then adjust security to prevent a repeat of the situation.

Our human resources department and legal departments love us because all of the controls limit the possibility of someone bringing offensive digital materials into the work place.

It may seem heavy handed, but locking down your network access is also a key item to any PCI compliance effort.
Posted by Hates Idiots
8th Jun 2011
0 Votes
+ -
Need to Socialize Data
Thanks for sharing your research on this, Joe. Pretty interesting study. As the volume of unstructured data continues to explode from social networking sites, sensors, RFIDs and geolocation devices, its more important than ever for companies to make sense of it all. As you point out, those who can use this data will be at a significant competitive advantage.

To do this, though, companies must integrate this flood of non-traditional data with the traditional data they already have about customers things like transactional data, emails, call center records, financials, etc. This socialization of data provides a new 360 degree view of customers and lets businesses make more informed, smarter decisions. To learn more about this concept, check out my website (www.socializationofdata.com) and let me know what you think.

- Darryl McDonald, Teradata
Posted by lwmg
Updated - 8th Jun 2011
0 Votes
+ -
Clueless invasion of privacy.
Having worked on CRM systems for over 10 years I can tell you 90% of the data collected is never used by the collecting company. Most company management are still clueless on what to do with it. They collect it because someone told them they should and sold them a system. They hardly think about it until a hacker hits the jackpot and gets into one of these information treasure stores. Then it is a panic.

The sick joke up here is Massachusetts has one of the stricter data privacy laws, but they included an exception for all government agencies. The state unemployment office was recently hacked and the people who had their identity stolen have no recourse. But under the same law a private sector company would face million dollar fines.
Posted by Hates Idiots
9th Jun 2011
0 Votes
+ -
Unused data collected
Nearly every company in the US has a form somewhere for new hires/applicants to list their previous job history in chronological order.

This practice was begun during WWII in order to attempt to prevent enemy sabatours from being hired. Since that time, it's been part of most hiring practices, regardless of anyone's need to know what your previous work was.

(This is part of the "How can you expect us to pay you $XX for this job when your last (totally different) job only paid you $YY?)

This and the practice of 'confidential' salary/pay exist primarily to keep wages down...since if you're salary isn't confidential (i.e. YOU have the right to reveal/conceal it,) you will rapidly find out what your salary is relative to others and it will become more difficult to pay new hires more than tenured employees.

All arguments supporting the confidentiality (you may NOT reveal!) are spurious. Public employees pay has always been public information and so far as I'm aware, it has never caused a major problem.

Labor is not a 'free market' if pay is "confidential."

Of course, surveys and individuals routinely reveal this information despite any signed agreements....
Posted by wizoddg
20th Jun 2011
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the SmartPlanet community and join the conversation! Signing up is fast and free. Don't wait -- we want to hear your opinion!