Your technology certification is waiting. Enroll in Cloud Class ®
All other PDF documents, including hybrid files containing both searchable text and scanned text, are sent to the default Data Security extractor, not the OCR server. Should the system fail to extract text from a PDF, it is forwarded to the OCR server.
Core Detection & Analysis Algorithms
Methods for describing sensitive content are abundant. They can be divided into two categories: precise methods and imprecise methods.
Precise methods are, by definition, those that involve Content Registration and trigger almost zero false positive incidents.
All other methods are imprecise. They include: keywords, lexicons, regular expressions, extended regular expressions, meta data tags, Bayesian analysis, statistical analysis such as Machine Learning, etc.
Combined with the proprietary algorithms, GTB's AccuMatchTM detection algorithms have virtually zero false positives and a very high resilience to data modifications including:
Excerpting, inserting, file type conversion, formatting, ASCII ->UNICODE conversion, UNIX–Windows conversion, partial data match, and so on.
Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.
Have a better answer? Share it in a comment.
Please enter a first name
Please enter a last name
Must be at least 4 characters long.
Join and Comment
From novice to tech pro — start learning today.
Premium members can enroll in this course at no extra cost.