Classify and locate important documents

In a previous article, I showed you how to use a PowerShell script to process automatic answers to an alert that would be triggered by Varonis DatAlert module.


But as you may know, Varonis propose others modules such as Directory Services (for extended Active Directory monitoring), Automation Engine, Data Transport Engine but also Data Classification Engine (DCE). And today, we will work on this specific module. 🙂

One of the best argument is the GDPR regulation. With Data Classification Engine (or Data Classification Framework – same component just recently renamed), you can analyze all your terabytes of data with precise rules to analyze and locate the documents that could have personal or regulated informations.

Indeed, since the GDPR and all associated articles that have been published a lot of companies have understood they need to know how their file servers are used and what type of information people are saving on them. In fact, there are several tools on the market that can help you to label and classify your documents:

But if you’re using Office 365, there is also an integrated tool (depending on your O365 licence).  This one is called Azure Information Protection and can allow you to add a new bar in your Microsoft Office components.

With this bar, your people will be able to classify every document they are working on – and most important, you can find them without going in each file. If needed, you can adapt the way they are stored, saved or protected in your infrastructure depending on the document type.

Using Data Classification Engine (DCE)

Data Classification Engine can scan all your documents and find some specific keywords or patterns. It can detect medical, financial information or personal data according to the new GDPR articles that are available for European countries.

It would be very easy to ask DCE to scan for a specific word or pattern (such as RegEx) but let’s imagine a more concrete scenario: your company has started to classify and labeling their documents with several sensitive level for example: 

  • SEC-01: this keyword will describe the documents that are identified as “Secret” documents.
  • CONF-02: this keyword will be used to define a “Confidential” document. I put also a number at the end to imagine that we can have several level in each category.

These keywords could be stored in the document’s body, header or footer and if you can constraint your users to classify their documents then you can adapt the way you’re managing all these documents.

For this example, I will create 2 documents and we will suppose the classification will be added in the header of each document. Then, store the documents in one of your folder on your Varonis monitored resource. 🙂

Creating your DCE rule

Remember: when I said DCE or DCF it is the same!

Open your DatAdvantage application and go in the Tools menu and then in DCF and DW configuration. Create a new DCF rule for our first document type.

Choose a new Rule name, Rule description and a new filter.

  • Rule name: Custom – Secret Documents
  • Rule description: All my documents identified with SEC-01 keyword
  • Conditions: String equals SEC-01 

Repeat the same process to manage our 2nd type of document.

  • Rule name: Customer – Confidential documents
  • Rule description: Document tagged with: CONF-02
  • Conditions: String equals CONF-02

In the Scope section, you can if you want restrict the research to a specific folder (if you don’t want to – DCE will scan all your monitored servers).

Depending on the amount of files and servers that you’re monitoring the scan process can take a while. Wait at least for a night (or run manually the jobs).

Check the results of the scan

First, you can check the status about your scan in DCF and DW monitor option (in the Tools menu). According to our pie chat, all my documents have been scanned.

We can now see the results about our DCF scan in the Work Area tab in DatAdvantage.

In the final result, we can see all my sensible documents are stored in the folder “Finance” and I know the number of files concerned (with their name). In my example, it’s only question about 2 documents but imagine the results with multiple rules and a lot of more documents.

Here is another example about the same interface but when you have much more rules. As presented before, DCE can analyze every type of data thanks to patterns and RegEx such as financial information, medical data, … and everything that could be specific to your company in fact.

As you may see, we can scan and analyze a lot of information (driver licence, number of identity card and a lot of more personal or GDPR data). 🙂