Bleeding Edge Information Governance: Using AI for Compliance and Classification

Vendors and organizations have been trying to apply artificial intelligence (AI) to records management and related information governance (IG) activities since the 1990s, always with disappointing results. But the technologies have matured and are finally justifying the cost, effort and risk of deploying them for IG. If you are considering implementing such technologies for IG compliance and classification, it helps to understand the specific benefits they can provide so you can assess whether it’s worth it and plan where you should focus and lay out your roadmap. This strategic, “governance by design” approach is key to success.

Where Can IG Classification Tools be Used?

We’re concerned with tools that can be pointed at entire information repositories or subsets of files in them and can determine their attributes relevant to their administration, records management, security, usage and other requirements. These tools can be applied to:

  • Existing information systems as part of an initial portfolio inventory or discovery
  • A tollgate or stage-gate process for new systems
  • Existing systems as part of enhancing, remediating, optimizing or regular audit
  • Subsets of files within an information system
  • Files upon creation or ingestion into a system on a go-forward basis

When you evaluate how you might use AI classification for IG, it’s useful to frame the opportunities and benefits in terms of an information lifecycle model.

An Information Lifecycle Model


There are five stages of the lifecycle that information typically migrates through at all organizations. Information may not progress through every stage, and stages may occur multiple times in various orders. The five information lifecycle stages are:

  1. Create/Capture: Author and/or ingest information (such as documents or files) into the information repository.
  2. Store: Securely maintain information for the length of time required by business purposes and compliance obligations.
  3. Use: Access information, share information and perform tasks with other people.
  4. Hold: Preserve information relevant to discovery, and then release it when preservation is no longer necessary for discovery.
  5. Destroy: Permanently destroy or preserve information at the end of its retention period.

Let’s look at how AI can be effectively applied at each of these stages, or how it benefits each stage.


The best place to classify your information is where it is created or captured in your systems. You can implement and roll out AI-based classification technologies to automate data capture and reduce manual indexing. Automated processing will reduce the need for business unit employees to spend time in lower value document handling and profiling/indexing. It will also allow for greater authorized document access.

In all stages, you should implement an IG tollgate process to ensure that any new systems have the necessary IG capabilities to fulfill the requirements that are identified as files are ingested.


The automatic classification process identifies and tags files with the IG attributes. The corresponding requirement is for the information system to provide the appropriate repository management controls in place as needed (e.g., restricted access, intact metadata, retention requirements and export functionality). Your AI IG tools provide you with an efficient way to ensure that your repositories are able to address the IG requirements of your information. They also allow you to address a variety of business requirements, including mandatory storage restrictions; fast, reliable access; multiple versions and version controls; and complex retention configurations, such as event-based triggers.


Your IG focus should include the usability and availability of information for all authorized users, for both defensive and offensive use of information. Therefore, the metadata for information assets should address not only defensive requirements (e.g., Privacy, RIM, Legal, InfoSec, etc.), but also the usability and offensive requirements (e.g., for searching and accessing, drafting and editing, sharing and collaborating, version control and tracking changes).

These use requirements should be utilized in provisioning and configuring systems to enable compliance as well as ensure that users are effectively leveraging the information they need.


Hold applies to all discovery processes, whether they are legal, regulatory, audit or other mandatory discovery activities. All your information is potentially subject to hold, but you should focus most on information with higher discovery probability and risk.
By using AI classification to help map IG requirements to repository controls and usability, information assets and systems will be managed to make discovery more efficient, including suspension of normal lifecycle upon hold notice and resumption upon official release.


You should be systematically and consistently destroying information that has reached the end of its required retention period, has no other preservation obligation and has no other business value. AI IG tools enable your information and information systems to be designed and maintained to be adequate for at least four general destruction cases: normal destruction “according to daily business processes,” legacy cleanup, day-forward cleanup projects and day-forward scheduled destruction processes.

Caveats and Next Steps

This post suggests that AI can be effectively used for IG at the enterprise level. It uses the information lifecycle to outline AI’s specific benefits in this context. We’ve found such an approach to be very useful for building a business case and for planning the roadmap for implementing AI for IG.

But now for the caveat: the organizations that have been successful in applying AI to IG have, in our experience, all been at the “affluent” end of the spectrum — in industries such as financial services and pharmaceuticals. They have spent years and millions of dollars on enterprise big data analytics. Now, they are beginning to apply that investment to IG. These organizations must use something like the lifecycle approach to plan and manage IG. The rest of you should also use this approach to safely introduce AI into IG and ensure that each step builds the foundation for later steps.

New call-to-action

Rich Medina
Rich Medina
I’m a Principal Consultant and co-founder of Doculabs, and the resident expert in using ECM for information lifecycle management.