I was working with a financial services client recently, and had an interesting discussion about how and when to prioritize sensitive data in its information lifecycle governance (ILG) program. How to prioritize sensitive data is a business decision.
For some companies, managing proprietary information is more important than managing PII and PCI data.
As the client, with Doculabs assistance, was defining “sensitive” data, we saw that the company put greater emphasis on proprietary information, such as trading history and merger and acquisition information, than it did on what is customarily viewed as sensitive data, such as PII and PCI information.
Many clients instead have what I call an evolutionary approach to identifying data types in their unstructured data repositories. They start with lower-value content, they then tackle more sensitive data.
Most organizations begin their ILG efforts by identifying junk and low-value content.
Many organizations (especially those that are not highly regulated) begin their ILG initiatives by identifying (and removing) junk and low value content. This can be done based on a defined set of file extensions that easily can be removed from the environment.
This accomplishes two things at the same time. Removing junk and low-value content speeds up future scans. It also reduces the amount of data that may need to be reviewed more manually within a business department.
Enterprises next will tackle abandoned information connected with departed employees.
Next, organizations usually tackle abandoned information—content that’s somehow associated with employees who have left the company. In this step, there can be a legal benefit: there’s less potential discoverable information in the event of litigation. (Having less discoverable information also is a side benefit of eliminating junk or low value content.)
Typically, it is only after an organization tackles the previous two types of information that they then move to deal with “sensitive” information. Even then, the focus is almost entirely on things like PII (with a particular emphasis on social security numbers), PHI and PCI, depending on the industry.
Record identification can be difficult to do with precision and non-records can end up in repositories.
If a file and analytics tool opens a file with SSN numbers and evaluates that file, there’s a numerical pattern that’s well defined. But for other types of sensitive information, like the M&A research that was so important in our client’s case, you need to use keywords.
The problem is that unlike the patterns found in SSN records, a keyword search may result in false positives. Non-records may end up in the repository. The lesson here: record identification is very difficult to do with precision.
The client referenced above jumped into the proverbial deep end by trying to identify more than 10 categories of sensitive information. Many of these categories required a custom “rule” definition. There’s not much of a problem trying to identify this much information–but trying to remediate or develop controls for all at once may prove to be nearly impossible.
But where the conversation with the client got interesting was around how it planned to go after the different types of sensitive data. Almost all our clients are focused on the “big” categories – PHI, PII and PCI. This client, however, doesn’t necessarily believe those data types present the most risk.
When proprietary information provides more risk than other sensitive data.
Based on its analysis, the company determined that were it to be breached, some of its proprietary information posed more of a financial risk than information such as Social Security numbers and the like. The company believes that the release of proprietary information would negatively impact their revenue to a greater extent than the release of PHI, PII or PCI information.
I haven’t seen the analysis of how the company came to this conclusion, but it raises a broader point (and one that I’ve made in previous posts.) How your organization decides to address data discovery, cleanup and compliance with regulations like the European Union’s GDPR and the California Consumer Protection Act often comes down to a pure economic decision.
How your organization defines its ILG strategy falls into the similar decision-making process. Good information governance is, at times, a business decision. And it can effect what content you tackle first with your ILG initiative
Some related blogs:
- Retention and Sensitive Data Identification.
- How to Make it Easier to Comply with GDPR
- U.S. Privacy Regulations: The Other Reason to Get Ready for GDPR.