6 Challenges When Preparing for Content Cleanup

Every organization needs to clean up their content — to comply with laws and regulations concerning record keeping and privacy, for example, or to keep relevant content findable for end users. Yet, despite the importance of content cleanup, very few organizations are successful at it, whether on a go-forward basis or looking backwards to the legacy content they’ve amassed over the last decade or more.

There are a number of reasons why organizations struggle with content cleanup:

  1. Vision – why are you cleaning up content?
  2. Tactics – do you know how you’re going to clean up content?
  3. Alignment – do you have agreement from all stakeholders on why and how you’re cleaning up content?
  4. Scope – what content will you clean up?
  5. Resources – do you have the people to clean up the in-scope content?
  6. Technology – do you have technology to assist in content cleanup?

Let’s dive into each of these content cleanup challenges to find ways to address them and make progress.

Challenge #1: Vision

Before you can make progress cleaning up content, it’s absolutely critical to define the vision for content cleanup: Why are you cleaning up content? This vision should not just be because content cleanup is part of good information management — no one who matters at your organization will get behind your efforts in any meaningful way if this is your sole justification.

Instead, you need to focus on what matters to your key stakeholders, which, although the specifics typically differ between organizations, will likely be some variation of the following goals:

  • Legal – reducing litigation risks and costs, reducing e-Discovery level of effort, improving early case assessment accuracy
  • Privacy – ability to comply (and demonstrate compliance) with consumer privacy laws and regulations (e.g., GDPR, CCPA)
  • Information Security – reduced risk surface due to lower volumes of unmanaged sensitive data
  • IT – lower level of effort to manage systems, improved ability to meet service level agreements (SLAs) around business continuity and disaster recovery
  • Records Management – improved ability to comply with records retention laws and regulations
  • “The Business” – enablement for operational projects that are information intensive

Despite the diversity of these goals, at most organizations it’s not difficult to link content cleanup to at least one of these (if not more). Doing so is critical to getting buy-in that matters for your cleanup efforts.

Challenge #2: Tactics

Once you define a vision for your cleanup efforts, you need to determine how you’re going to clean up content to achieve that vision. As with your content cleanup vision, tactics can span a range of options, but the following are typical for most organizations:

  • By application – address the highest risk/value systems that manage information
  • By content type – address the content types that are most important to your organization (e.g., contracts, design documents, M&A documents, price sheets)
  • By business process/function – address the content used by the business-critical functions
  • By risk level – address the content types that pose the greatest risk to the organization (e.g., PH, PII, PCI, IP)
  • By obligation – address the content that is subject to the highest priority laws and regulations your organization is subject to (e.g., GDPR, CCPA)

Regardless of which of these options for addressing content cleanup you choose, you should begin with the easiest content (ROT – redundant, obsolete, and trivial content) before moving to more difficult content (e.g., PHI, PII, PCI, IP). Doing so will allow you to remove the lion’s share of content (40% on the low end, 80% on the high end) with low-resource methods before using high-resource methods.

Challenge #3: Alignment

Once you have your vision and tactics settled, you need to gain alignment from key stakeholders to allow you to execute without organizational pushback. Typically, you’ll need to align to gain buy-in from the following functions:

  • Legal
  • Privacy
  • Information Security
  • IT
  • Records Management
  • Relevant Lines of Business

Regardless of your mandate (and where your authority and funding ultimately come from), each of these stakeholders typically have a say — whether formally or informally — on whether your content cleanup efforts can move forward, and with what level of support (dollars and resources).

Challenge #4: Scope

At most organizations, the content cleanup problem is massive: terabytes (if not petabytes) of content, on multiple systems (e.g., shared drives, e-mail, SharePoint, Office 365, niche systems), spanning a decade or more, with ownership that is difficult or impossible to determine, and a lack of technology to assist in interrogating it. Meanwhile, end users are creating new, unmanaged content everyday that adds to the backlog of content mess.

So once you have agreement on the vision for content cleanup and the tactics you’ll use, you need to decide what content is in scope for your efforts. The best way to address the scope problem is to split your efforts into legacy and go-forward content.

Each of these requires a different approach. Legacy content is static — it is what it is — and so you can address it with a longer-term view. Since you’ve been creating and retaining this content for years, taking 18–36 months to solve it is reasonable. Go-forward content is dynamic and ever changing (and will be tomorrow’s problem if you don’t address it ASAP), so you need to address this content immediately to prevent adding to the backlog of cleanup problems.

Challenge #5: Resources

The best-laid content cleanup plans can fail if your organization doesn’t have the resources to do the work required. In general, you should think about your resource needs from two angles: technical (do we have people with the technical skills to interrogate and clean up our content) and “the business” (do we have people who understand how content is used and managed in the course of day-to-day operations).

Whether or not you have employees with the expertise in these two areas, you need to assess if you have the organizational support to either reallocate resources to these two tasks or hire folks to do them. Ultimately, your ability to secure resources to assist with content cleanup will depend on if leadership supports your efforts.

Challenge #6: Technology

Finally, without technology to assist in determining what content you have and what can be cleaned up, you’ll struggle to make meaningful progress on content cleanup.

  • To this end, there is a range of technology products across domains available to assist in cleanup efforts:
  • Data Loss Prevention (DLP) – interrogates content moving outside the firewall or between systems within the firewall to determine whether it should be allowed to move or be prevented from moving
  • File Analytics – interrogates content within the firewall to determine what type of content it is (e.g., ROT, PHI, PII, PCI, IP)
  • Identity and Access Management (IAM) – provides the ability to control access to content based on rule sets (role, org chart location, etc.)
  • Information Rights Management (IRM) – provides the ability to control sharing, saving, and printing at the document level
  • E-discovery – provides the ability to search documents and data by keyword and custodian (i.e., content owner) and trace dependencies between documents (e.g., email chains)

Although the right mix of these technologies will differ from firm to firm, given the sheer volume of content to be cleaned up at most organizations, there’s no practical cleanup effort without some technology in place.

Baby Steps

This post has covered a lot of ground, the complexity of which might be paralyzing. So where do you start? Despite how complex the overall problem of content cleanup is, making progress starts with a single step: defining your vision. Ask yourself why you’re embarking on content cleanup, and the rest will follow — rapidly (given enough resources and executive support) or slowly (given low levels of support and funding). In either case, you’ll make progress on cleaning up your content by sorting out why you’re cleaning it up in the first place.

Best Practices for Content Cleanup

Rich Medina
Joe Shepley
I’m VP and Practice Lead, focusing on developing Doculabs’ InfoSec practice and its applications in a wide range of industries.