Forget Big Data

Although big data is no longer a buzzword and has become a legitimate part of how business is done, that doesn’t mean it’s well understood by most business folks or that it’s delivering value consistently in those organizations which are attempting it. And while there are clearly some firms that are getting big value from big data (primarily firms in retail and consumer products), most of the organizations I encounter are struggling to realize the value they’d hoped for when they decided to tackle big data.

There are a number of reasons for this: poor technology, difficulty executing, having the wrong goals; even just dumb luck. But by far the most significant reason so many firms haven’t gotten the value is the fact that you can’t do big data well until you can do small data well. No amount of technology or futuristic vision can overcome issues of fundamental data management and data quality and integrity. Too many firms have been seduced by the vision of the big data promised land and have struck out to reach it without attending to their very real small data problems.

Opaque Data

In a recent survey of data center leaders, Doculabs found that only 50 percent of firms surveyed had a data map, i.e. a detailed inventory of what data lives on what systems and who owns it. For these organizations, their data is opaque—they know very little about what business data they have and how they're managing it.

Where this is the case, an organization struggles to manage data properly at the application level, so how is it going to manage across all its applications, in order to tackle big data? Best case, it won’t even be able to do so; worst case, it will be able to, but the results will be so error-prone that it will derive little to no value out from its efforts.

Master Data Management

Master data management (MDM) is concerned with the quality and integrity of corporate data across all systems and processes; for instance, ensuring that you have a single source of accurate contact data for every customer, product data for every product, or employment information for every employee are things MDM ensures.

In my experience, most organizations struggle with MDM. The majority of firms have tackled it in a limited way to make sure their enterprise data warehouse (EDW) has clean data for reporting, but as for the source systems that feed the EDW, this is an area that, with some leading- and bleeding-edge examples, you see nearly every firm struggle with.

When you can’t consistently rely on source data to be accurate or consistent, big data is a real problem. As with opaque data, best case, firms won’t be able to do big data at all; worst case, they’ll manage to do it, but it will be founded on inaccurate or conflicting data—both of which ultimately limit the value big data can provide the organization.

Overall System Complexity

In addition to the fundamental data problems we’ve already looked at, there’s the additional issue of system complexity. At most organizations, you can figure there are 1.5 to 3 (or sometimes more) business applications per employee. These applications have all grown up organically rather than as part of a deliberate roadmap—in many instances on some tortured combination of distinct (sometimes conflicting) commercial products and homegrown solutions, with ad hoc, seat-of-the-pants integrations (or manual, “swivel-chair” integrations), frequently no integration at all, and little to no documentation on requirements or system specifications. Then add to this the breakneck pace of mergers and acquisitions and divestitures, and you have a system environment that makes a Rube Goldberg device look practically streamlined in comparison.

All of this makes it incredibly challenging and massively resource-intensive to manage applications individually, let alone at the portfolio level. In this world, small data is close to impossible, let alone big data. No matter how industry-leading your big data solution or service provider is, none of them can overcome the system spaghetti that’s in place at most organizations. Best case, you spend millions for a system that never gets off the ground; worst case, you spend millions on the system, then millions more for integrations that never work.

How Information Management Can Help

Lots of doom and gloom so far. Let’s turn to some productive things you can do at your organization to make things better for small data and begin to pave the way for successful big data down the line.

  • Create a systems inventory. Sometimes referred to as a configuration management database (CMDB). Lists what systems are in place, what they do, level of mission criticality, what technology they run on (applications and databases), disaster recovery/business continuity support, and IT and business owners.
  • Create a data map. Details what data lives on what systems, level of sensitivity (PHI, PII, PCI, intellectual property), security classification (public, confidential, top secret, etc.), legal risk (high/low risk of being discoverable).
  • Assess master data management (MDM) maturity. For key business data entities (customer, product, employee, location, project), is there a single source of truth accessible to the applications and employees that need it?
  • Assess information management policy infrastructure. Are there policies in place that would allow the organization to quarantine or secure sensitive data and purge junk, stale, orphaned, and abandoned data in place?

Although big data is a big topic, and intersecting it with information management makes it bigger still, hopefully this post got you thinking about how big data requires small data excellence to succeed and how better information management can contribute to those efforts.

Rich Medina
Joe Shepley
I’m VP and Practice Lead, focusing on developing Doculabs’ InfoSec practice and its applications in a wide range of industries.