How To Clean Up Network Shares

A version of this post originally appeared on the CMSWire blog.

It’s true that organizations for quite some time have been trying to implement better information management practices. And IT technologies continue to improve. But the most frequent question I get from clients is, "How do we clean up network shares?"

It seems like something that should be easy to do. But it’s anything but that. Like diet and exercise, which are simple in concept but arduous in practice, cleaning up network shares requires a simple approach—and a lot of elbow grease. Here’s an approach for cleaning up network shares that we’ve found has been successful at organizations of varying sizes across a range of industries.

You’ll need file analytics software to help evaluate the documents on your shared networks.

The first step is to acquire file analytics software to help you evaluate the documents on your network shares. Even a small organization will have a few terabytes of content on shares. That translates into millions of documents—far too many to evaluate manually.

The good news is there is a range of file analytics tools are out there, from the dead simple (like a TreeSize Professional from Jam Software, which costs under $100) to the extremely sophisticated (e.g., Active NavigationNuixZL Technologies, IBM’s StoredIQConcept Searching, etc.). So, depending on your budget and maturity level, you should find technology to assist you in your network share cleanup efforts.

See my colleague, Rich Medina’s post: Which File Analytics Products Should You Use?

Regardless of this diversity in file analytics tools, one cleanup approach is successful more often than not. There are really three steps to consider.

Follow these three steps to clean your network shares: address ROT, address sensitive data and address records.

The required capabilities of the analytics software increases as you progress through each step of the process.

  • Step 1: Address ROT: The most straightforward step is to find and address redundant, obsolete, and transitory documents. This requires the least horsepower from a file analytics tool.
  • Step 2: Address Sensitive Data: There’s more sophistication needed from file analytics tools to find and address documents containing sensitive data such a s PHI, PII, PCI, IP, legal hold data, etc.
  • Step 3: Address Records: Finding and addressing records requires the most sophistication from file analytics tools.

Let’s walk through each of these to see how they might be valuable for your organization’s network share cleanup efforts.

Step 1: Address File ROT

File analytics tools work quickly when addressing ROT by interrogating file properties, known as wrapper metadata.

In order to address ROT, a file analytics tool only needs to interrogate file properties (so called wrapper metadata), such as file path, file name, file type, file size, date created, date last accessed and date modified. It does not need to crack open the file and interrogate the contents.

For this reason, scans are faster (with speeds approaching 1 million documents an hour). In addition, anywhere from 30 to 80 percent of network shares at an organization will be ROT, so performing this analysis first will take a significant amount of documents off the table for analysis in later, more complex steps.

Once you find the ROT, you need to do something with it, which can be as simple as quarantining it (whether in a dedicated repository or by leaving it in place or removing access) or as final a step as deleting it.

What an organization does in any given case depends on organizational culture, the policy and procedure landscape, etc., but finding ROT first gives you the ammunition to take the first step — and paves the way for addressing more valuable and risky network share content.

For dealing with ROT, see Linda Andrew’s Content Migration Case Study: “Let’s Get Rid of the ROT.”

Step 2: Address Sensitive Data

Sensitive data includes, PHI, PII, PCI, board materials, IP, financial analysis and M&A information

Once you’ve addressed ROT, you’ve likely reduced the volume of your network shares by 30 to 80 percent, which will make finding and addressing sensitive data much less resource- and time-intensive.

Typically, sensitive data at any organization can include:

  • PHI: Protected health information;
  • PII : Personally identifiable information; and
  • PCI: Payment card information.

Depending on the organization, sensitive data also could include:

  • Board materials
  • Intellectual property (IP)
  • Financial analysis
  • Mergers and acquisitions(M&A) information and due diligence

Although the approach is straightforward (i.e., find sensitive data and secure it), the devil is in the details — and there are far too many details to address exhaustively here.

Suffice it to say that you’ll use a combination of out of the box and customized rules for identifying sensitive data using pattern matching. (For example: ###-##-#### as a pattern suggests a social security number may be present).

This step is quite time intensive, even with out of the box rules, because the goal is to find the sweet spot between too broad and too narrow, and that can take many iterations.

For a good analysis of the link between sensitive data and information management, see Jim Polka’s recent Retention and Sensitive Data Identification.

Step 3: Address Records

Rationalize your retention plan with as few categories as possible.

With ROT and sensitive data addressed, you can move on to addressing records … or not. Once you reach this point, you may find that only 5 to 20 percent of your network shares remain, so it might not be worth the effort to go further. But if you do go further, know that there’s no easy way to make progress: it’ll be a long, hard slog, one department (sometimes one record series) at a time.

Some thoughts on how to make this slog a little less difficult:

Start by rationalizing your records retention plan: you want to get to as few categories as possible (e.g., if a department has three-, five- and seven-year retention periods for documents, make them all seven and you can classify the department’s documents all as one thing … and perform disposition on them much more easily)

Look for departments that organize records by record types. 

Find departments that organize their network share by record types. If you can find folders that have a single record type in them, you can classify and disposition them more easily (e.g., finance, HR, compliance, M&A, etc.)

Use pattern matching, when you can.

Find record types that can be identified easily using pattern matching. Then you can create rules à la sensitive data to find records (e.g., contracts, POs, invoices, etc.)

Hopefully, you now have the clarity on how to proceed with cleaning up network shares.

As I said at the opening, cleaning up network shares is conceptually straightforward but difficult in practice. Hopefully the approach I’ve laid out gives you the clarity on how to proceed and convinces you that, given the right technology, vision and elbow grease, you can make meaningful progress on cleaning up your network shares.


Rich Medina
Joe Shepley
I’m VP and Practice Lead, focusing on developing Doculabs’ InfoSec practice and its applications in a wide range of industries.