Evaluating Tools for Content Cleanup

Content cleanup is a hot topic these days, on the one hand, because of recent privacy regulations like CCPA and GDPR; but on the other hand, the tools available to assist with content clean up have matured rapidly over the last few years. This is good news for organizations looking to get rid of stale and junk content, to identify and protect sensitive data (like PHI, PII, and PCI), or to get a handle on their intellectual capital.

In order to leverage these tools, however, you first need to select one (or more) to use, which is often easier said than done, not only because the typical RFP process used to select software is not the best way to find the right solution, but also because there are a wide range of tools that have grown out of different domains (such as e-discovery, migration, file analytics, and security and access) into the content cleanup space. Given this, getting an apples to apples comparison of vendors and products can be a challenge.

But don’t despair—Doculabs has helped many clients successfully evaluate and select content cleanup solutions. In this post, we’ll take a look at some good practices for doing so at your organization.

Cleanup Tools - an Overview

This post isn’t meant to be a guide to all the available content cleanup tools – rather, we provide guidance on how to effectively evaluate tools before making a final selection. But before we dive into how to select content cleanup tools, it’s useful to understand the domains out of which the leading tools have developed.

Overall, there are six domains (other than pure play content cleanup) from which tools have developed:

  • File analytics – tools engineered to scan both file metadata (e.g., file path, file name, file extension, date created, date last accessed, date last modified) as well as file contents, by using regular expressions, natural language processing, artificial intelligence, or some combination thereof.
  • Security and access – tools engineered to scan directories to determine user access (authorized users, level of access, etc.), user behavior (who accessed what directories and files, to do what, etc.) or both.
  • Migration – tools engineered to assist in migrating content to a target repository, from lift-and-shift migrations with no change to the files, to remediated migrations with changes to file organization, metadata, etc.
  • E-discovery – tools engineered to enable the identification and interrogation of files for responding to litigation or regulatory events; typically scan both file metadata as well as file contents.
  • Records Management – tools engineered to support retaining documents for the period defined by laws, regulations, etc., and then systematically disposing of them when that period is over.
  • Governance, Risk, and Compliance (GRC) – tools engineered to support GRC activities, from managing policies and procedures to labeling content and life cycling repositories and content to comply with policies and procedures.

Let’s turn now to how you can effectively select a cleanup tool, regardless of the domain it comes from.

Tool Selection – Do It Right With a Pilot

Ideally, to select software effectively you’d ditch the typical RFP process entirely and take a different approach, one focused on collaboration with potential suppliers, radical transparency about goals, needs, requirements, scope, resources, and long-term vision—all in the service of finding the best fit between buyer and supplier, to the benefit of both.

  • Goals definition – a one slide summary that should encompass business, technology, and compliance goals.
  • Marketplace survey – leverage Gartner, Forrester, and industry peers to get to short list.
  • Initial vetting – initial contact to suss out whether they’re at all for real (and whether you’d like to do business with them). Ideally you’ll end up with no more than three vendors to go further with.
  • Solution workshop – an in person whiteboarding session to share the problem and goals with potential suppliers, hear their on-the-fly response, and collaborate on potential solution angles of approach.
  • Targeted demo – have supplier present demos tailored to the solution angles defined in the solution workshop.
  • Pilot – contract with supplier to deliver a limited proof of value engagement to evaluate their ability to address core goals using one (or more) solution angles identified in the solution workshop.
  • Go/No go – based on the results of the pilot, decide whether to move forward with the solution or to terminate and move on to the next supplier.

Although getting each of these steps right is critical to the success of your software selection, let’s dig deeper into one of the most important (and often overlooked) steps in the selection process: the pilot.

A Pilot Leads to Success

A pilot is important when selecting content cleanup software for a few reasons. First, it allows you to evaluate whether the software will work (and how well) in your environment with real documents. Second, it allows you to gain experience with the vendor to see how they work, how well they work with you and your team, and how they handle the inevitable challenges that you’ll encounter. Finally, it allows you to show value to the organization early through a small cleanup effort—which goes a long way towards promoting the solution to your organization to gain support for the longer implementation and roll out.

The length of the pilot will depend primarily on what you’re looking for, the volume and complexity of your repositories, and how arduous it will be to jump through the necessary hoops for IT logistics, security, access, etc. We recently coordinated a pilot that had four weeks of preparation, one week of actual scanning, and one week of results analysis. This is on the low end. In our experience 80 percent of pilots take between 6-12 weeks to complete.

Key Elements of a Successful Pilot

While pilots will look quite different based on your goals and organizational specifics, generally successful pilots address the following:

  • Goals – what does your organization seek to achieve with the software you’re evaluating? How does the pilot contribute to achieving this?
  • Requirements – what are the functional and non-functional requirements for the pilot? Be sure to address line-of-business, IT, Legal, InfoSec, Privacy, Records Management, and other compliance requirements.
  • Scope – what business units, what repositories, what information types, what document types, and what use cases will the pilot address?
  • Success metrics – how will you measure the success of the pilot? Choose metrics that balance classification accuracy rates (Doculabs recommends 80 percent as a starting point), volume of content cleaned up, resource requirements, and end user impact.
  • Go/no go criteria – what will the decision to move forward (or not) be based on?
  • Next steps – if the pilot succeeds, what are the next steps? If the pilot fails, what are the next steps?

Paying for a Pilot

Set up the pilot like a real third-party IT project, with milestones, specific expectations, and a payment schedule that reflects those. Note that with content cleanup, there is a relatively significant percentage of pilots where the results are disappointing. But this is not necessarily anyone’s fault – it’s often because you have a lot of difficult data to clean up. The disappointing pilot will still be a valuable learning experience.

When You Don’t Need a Pilot

Is there ever a situation where you should NOT use a pilot to evaluate a tool? Sometimes you have a relatively simple, small-scale cleanup opportunity and you can do a high-level scan with a tool that is likely an adequate solution for the job. For example, a high-level scan might just address obvious ROT (redundant, out-of-date, trivial) files. It might just look at file size, type, dates, and similar easy attributes. You can get a tool like this for under $100 and it’s sometimes worth doing the scan without a pilot.

When the Stakes are High, Ask for Help

Selecting a content cleanup solution can be complex and risky with high-volume, high complexity content—and doing so effectively requires far more detail than can be provided in a blog post. But hopefully this post has given you food for thought, not only on the content cleanup domain, but on the software selection process generally. Doculabs has been helping large, heavily regulated organizations make effective software decisions for over two decades. We’d love to meet with you to discuss your software selection needs in more detail and share our industry perspective on what you’re facing to help you succeed.

Best Practices for Content Cleanup

Rich Medina
Joe Shepley
I’m VP and Practice Lead, focusing on developing Doculabs’ InfoSec practice and its applications in a wide range of industries.