Minimizing the Risk Surface of Unstructured Content for InfoSec: Content Migration

As I mentioned in my previous posts on Security and Access Remediation and Content Cleanup and Disposition, I’ve had a lot of clients recently start to undertake various projects to address the risk presented by their unsecured unstructured content.

When I say “unsecured unstructured content,” I’m talking about all those Microsoft Word documents, PowerPoint presentations, Excel reports, and PDFs that now live out on network drives or on unmanaged SharePoint sites. In many instances, a small but significant amount of this content either contains sensitive information (such as personal health information [PHI], personally identifiable information [PII], or payment card industry [PCI] information), or it’s content that hasn’t been accessed in more than 2 years.

Organizations I’ve worked with each take different approaches to addressing this problem. I’d like to share with you Doculabs’ recommended approaches to this type of content, focusing this time on content migration (which often has a content cleanup and disposition component).

Again, the problems organizations are trying to address center around three things: cost, effort, and risk.

  • Access: This can include overly permissive access to sensitive data, but also incorporates overly restrictive access to data that needs to be more widely available. Poorly managed repositories run rampant with these access issues.
  • Effort: Managing content properly requires a significant amount of manual effort in most legacy content repositories. Additionally, keeping content past its legal and operational life leads to increased e-discovery cost and effort. For example, one recent study showed the cost of outside counsel review can range from $5,000 to $30,000 per gigabyte of data (source: Finally, it’s much harder to protect critical content in most legacy repositories, especially when it’s mixed in with high volumes of junk and stale content.
  • Risk: When sensitive data, records, or other information is over-retained, having these documents in a corporate repository can lead to increased fines in the event of a breach. Additionally, non-compliance with corporate policies can make it more difficult to defend the organization’s conduct with regulators and the public.

So how do you go about using content migration to address this?

Content migration involves identifying, securing, and migrating high-risk content (such as the aforementioned PHI, PII, PCI), along with data that’s considered intellectual property[IP]), as well as identifying and appropriately managing (or deleting) orphaned/abandoned content, stale content, junk content, and duplicates. In most organizations, the focus is primarily on Microsoft environments (such as shared drives and legacy SharePoint) as source repositories, while the target repositories can include Office 365 (SharePoint and OneDrive for Business), ECM or records management systems, or line-of-business applications.

It’s important to note that Doculabs strongly recommends content cleanup and disposition prior to migration. After all, you wouldn’t move all your old stuff to a new house, and then have a garage sale!

One other note: Why not just address the sensitive or high-value content in a migration?

Doculabs takes a holistic approach to content cleanup and migration (including addressing security and access issues), and we believe there are two critical steps which help to reduce the amount of content needed to migrate and, ultimately, the effort required to identify and secure high-risk/high-value content. One is to assign ownership to orphaned/abandoned content, and the second is to identify and life cycle (by purging or dark archiving) stale content, junk content, and duplicates. While identifying and securing high-risk/high-value content is the goal, these first steps help to ensure you aren’t migrating content you don’t need and that you’re securing the right content.

Doculabs’ approach to solving this challenge for our clients is deliberate and methodical, consisting of a three-phase process of Discovery, Planning, and Remediating.

  • Discovery: First, we work with organizations to identify the scope of the issue (including the review or development of a data map and application inventory) and ensure that stakeholder and authority alignment exists for quick decision-making regarding the migration (and potential disposition) of content.
  • Planning: Then we work with stakeholders to update policies and procedures, perform an in-depth content scan and analysis, develop a data disposition playbook (if needed), determine source to target criteria (e.g. “Where do Sensitive Data, Records, and Transient documents go?”), and finally ensure agreement on a workplan and approach.
  • Remediating: Finally, we ensure departmental alignment with our approach, identify records and legal holds from within the in-depth content scan, assign ownership to orphaned and abandoned data, lifecycle out of scope content (ROT/duplicates), map content to a target structure, and then finally migrate and test for target accuracy. Any content left behind can then be deleted!

The following figure is a graphical presentation of these three steps of a Doculabs Content Cleanup and Disposition consulting project.

Doculabs’ Approach to Content Migration

So what does your organization get out of an effort like this?

First, you avoid some pretty significant (potential) costs. In addition to the e-discovery figures cited earlier in this blog post, an average organization with 100 terabytes of unstructured data, a 20 percent annual growth rate in content volume, and approximately 3 percent sensitive data can see cost avoidance in excess of $14 million over 5 years by reducing unstructured data (both sensitive and non-sensitive). You also reduce effort required for managing content properly in managed repositories such as Office 365 (SharePoint and OneDrive for Business), ECM or Records Management systems, or LOB applications and Information Security can more easily protect the organization’s valuable/high-risk content. Finally, from a risk standpoint, removing orphaned data (and thus reducing over-retention of records, sensitive data, or other information), you are likely to reduce the dollar amount of fines assessed in the case of a data breach, because well managed and defined cleanup initiatives demonstrate a “good-faith” effort when defending your organization’s conduct with regulators and the public.

This post concludes my series on minimizing the risk surface posed by your unstructured content. For almost all organizations, this is a timely undertaking; consider that some of the repositories housing that unstructured content (shared drives, as a big for-instance) have probably been proliferating for decades. The various hard- and soft-dollar costs of maintaining all that content, as well as the associated risks, have been mounting for almost as long. But the content analytics tools are now available for addressing those significant volumes of unstructured content: for cleanup and disposition, for assigning appropriate access and security, and for migrating content to appropriately secured repositories. At this point, those tools are tried and tested, and it’s no surprise that so many organizations are now looking into applying them for exactly the objectives I’ve discussed in this series.

If you’ve been considering a similar initiative, and would like some expert assistance, Doculabs would be happy to help! Read more about our services, and then contact us here.




Rich Medina
Jim Polka
I’m a Principal Consultant. My expertise is in security-based information management and strategic deployment of ECM technologies.