Minimizing the Risk Surface of Unstructured Content for Information Security: Content Cleanup and Disposition

As I mentioned in my previous post on Security and Access Remediation, I’ve had a lot of clients recently start to undertake various projects to address the risk presented by their unsecured unstructured content--things like Microsoft Word documents, PowerPoint presentations, Excel reports, and PDFs, all of them living out on network drives or on unmanaged SharePoint sites.

In many instances, a small but significant amount of this content either contains sensitive information (such as personal health information [PHI], personally identifiable information [PII], or payment card industry [PCI] information), or it’s content that hasn’t been accessed in more than 2 years. Organizations I’ve worked with each take different approaches to addressing this problem. I’d like to share with you Doculabs’ recommended approaches to this type of content, focusing this time on content cleanup and disposition.

Again, the problems organizations are trying to address center around three things: cost, effort, and risk.

  • Cost: Just to recap, content growth can be as much as 20 percent year over year, leading to increased data storage costs (including backups) over time, despite the general decrease in storage costs. Additionally, in the event of litigation, the cost of outside counsel to review terabytes of irrelevant data during e-discovery can be significant.
  • Effort: As content volume grows, it becomes more difficult for employees to sift through content to find what they need. Also, as with a litigation event above, legal and e-discovery efforts are significantly increased when content is over-retained past its regulatory or business use life. Finally, Information Security has a harder time protecting the most important content when it’s mixed in with high volumes of junk and stale content.
  • Risk: When sensitive data, records, or other information is over-retained, having these documents can lead to increased fines in the case of a breach. Additionally, non-compliance with corporate policies can make it more difficult to defend the organization’s conduct with regulators and the public.

So how do you go about addressing this?

Content cleanup and disposition involves identifying, deleting, and securing high-risk content (such as sensitive data like PHI, PII, PCI), IP), as well as identifying and appropriately managing (or deleting) orphaned/abandoned content, stale content, junk content, and duplicates. In most organizations, the focus is primarily on Microsoft environments (such as shared drives and SharePoint)—i.e. environments which tend to have the lowest control over content creation, sharing, and storage, in many case because these are responsibilities which have been pushed to the business. But content cleanup and disposition can also be applied to address legacy content systems (ECM, ERP, etc.) or other repositories as well.

At Doculabs, we take a deliberate and methodical approach to solving this challenge for our clients, using a three-phase process of Discovery, Planning, and Remediating.

We work with organizations to identify the scope of the issue (including the review or development of a data map and application inventory) and ensure that authority alignment exists for quick decision-making regarding the disposition of content. We then work with stakeholders to update policies and procedures, perform an in-depth content scan and analysis, develop a data disposition playbook, and finally ensure agreement on a workplan and approach. Finally, we ensure departmental alignment with our approach, identify records and legal holds from within the in-depth content scan, perform a final content scan, mark redundant, obsolete, and/or trivial (ROT) content /duplicates for deletion, and then delete content.

The following figure shows the three steps of a Doculabs Content Cleanup and Disposition consulting project.

Doculabs’ Approach to Content Cleanup and Disposition

So what does your organization get out of an effort like this?

First, you avoid some pretty significant (potential) costs. For example, an average organization with 100 terabytes of unstructured data, a 20 percent annual growth rate in content volume, and approximately 3 percent sensitive data can see cost avoidance in excess of $14 million over 5 years by reducing unstructured data (both sensitive and non-sensitive). Additionally, the cost of outside counsel review of irrelevant data during e-discovery can go down significantly. You also reduce effort required for employees to find the content they need (leading to increased efficiency), and Information Security can more easily protect the organization’s valuable/high-risk content. Finally, from a risk standpoint, removing orphaned data (and thus reducing over-retention of records, sensitive data, or other information), you are likely to reduce the amount of fines assessed in the case of a data breach, as well managed and defined cleanup initiatives demonstrate a “good-faith” effort when defending your organization’s conduct with regulators and the public.

In addition to Content Cleanup and Disposition and Security and Access Remediation, we recommend undertaking an effort to eventually migrate that content to a more secure repository. Check back, because I plan to share Doculabs’ approach on that topic in an upcoming blog post. In the meantime, check out our InfoSec services here.

Rich Medina
Jim Polka
I’m a Principal Consultant. My expertise is in security-based information management and strategic deployment of ECM technologies.