Content Examination: How do Entities / Entity Groups Work?

Document created by user.oxriBaJeN4 Employee on Jan 11, 2018Last modified by user.oxriBaJeN4 Employee on Jan 26, 2018
Version 9Show Document
  • View in full screen mode

This guide describes how entities and entity groups work with Configuring Content Examination Definitions and Policies.

 

What are Entities and Entity Groups

 

Before we go into detail, you must know the difference between entities and entity groups. Entities allow administrators to search for sensitive information in messages and attachments, without the need to create complicated word lists or regular expressions (regex). Entity groups are a collection of entities aligned by category (e.g. PII, PHI or Financial). This allows administrators to search based on a subject area, rather than listing individual entities to achieve the same goal.

 

How do Entities Work?

 

Content Examination Element MatchAn entity consists of the elements listed below. Only when all of the criteria above are met is a match made.

  • A validator. This confirms that the structure of the content meets the defined standards for the item you are looking for. For example, if looking for credit cards, the content must contain four blocks of four numbers, and a check digit within the specified range.
  • A regular expression. This is applied to the target content, if the validator check passes. Should the validator check fail, the content checks stop.
  • A word list. This is used to limit the number of false positives encountered by matching keywords for the subject area. For example, credit card keywords are used when using the credit card entities. This helps determine the context of the match, and allows us to exclude a string of numbers that meet the credit card checks but which isn't a credit card number.
Some entities or entity groups don't contain validators or regular expressions, as they don't relate to the subject area. For example, the ICD10Cm entity is just a list of medical conditions.

Example Content Examination Definition

 

Detailed instructions on configuring a Content Examination definition is covered in the Configuring Content Examination Definitions and Policies page. However before configuring a definition, it is vital you understand the information you are looking for, and whether there is any conflicts with other data that could cause false positives. Take the following example:

 

You wish to hold all messages containing references to American express credit card numbers. The "americanexpress" entity finds all these credit card numbers where they're found in the specified areas of an email (header, body, attachment).

1 detect americanexpress

However if a 15 American Express number is present in a message, that alone won't be enough for a match to be made. Instead the "americanexpress" entity performs the following:

  1. Possible matches are located using the entity's corresponding Regular Expressions.
  2. The possible matches are passed through an appropriate Luhn algorithm to reduce the number of inaccurate matches.
  3. Attempts to locate specific keywords within proximity of the matches found, that provide context relating to credit cards.

 

To summarize, a content examination hit for an American Express credit card only occurs if:

  • There's a 15 digit number that matches the appropriate Luhn check.
  • A term such as "Credit Card" or "Amex" is found within 300 characters of the credit card number.

 

See Also...

 

Attachments

    Outcomes