When it comes to managing big data, there are two competing schools of thought. One says that you should put all the information in a data lake, so you can magically find all these patterns to better serve customers, pitch products, and listen to market demand.
Privacy constraints mean that the size of your data lake should not be limitless.
The other school is more cautious. It points out that there are—and should be—tough privacy rules, and that if you’re going to keep lots of transactional data, you need to mask it and de-identify it. This is where we at Doculabs fall. You have to do a good job of screening, scrubbing, and masking data. The privacy issue is paramount.
Yes, it’s true that there’s lots of value to be had from historical data. But what if much of that information is poorly structured, for lack of a better word, garbage?
If you’re using artificial intelligence and looking for patterns from customer data, consider building a data warehouse from scratch. Use that aggregated, clean and scrubbed information to look for patterns.
Information governance and information security mean that you should build current, manageable data warehouses.
From our point-of-view keeping too much historical information is an information governance liability. Don’t suck it all in. Pay attention to your own policies for how long you should keep and dispose information—90 days, 180 days, however long you need to reduce your risk surface. You’ll still have enough data in you data warehouse to look for patterns.
Recently we were working with a large life insurance company that wanted to look at content, specifically at historic applications for life insurance for actuarial purposes that might be indicative of suicide. A very noble goal, indeed. Is there something that was overlooked, the company asked?
So the company sucked in ten, 20 years of data extraction off their applications. The result: they found that they had a heck of a hard time with all that noise.
After this initial false start, the company decided to take a different approach. In essence they followed our philosophy. They started from scratch, looking for suicide risk from new applicants in a much more structured way. And the data they came up with was much more useful.
Aggregating “big data” has its advantages and liabilities. Often, only the benefits of analytics and insights are noted, but seldom is a more cautionary view also considered. In light the consumer protection and privacy regulations, firms should be sure to balance the sometimes competing requirements for collecting and disposing data.