All other PDF documents, including hybrid files containing both searchable text and scanned text, are sent to the default Data Security extractor, not the OCR server. Should the system fail to extract text from a PDF, it is forwarded to the OCR server.
Core Detection & Analysis Algorithms
Methods for describing sensitive content are abundant. They can be divided into two categories: precise methods and imprecise methods.
Precise methods are, by definition, those that involve Content Registration and trigger almost zero false positive incidents.
All other methods are imprecise. They include: keywords, lexicons, regular expressions, extended regular expressions, meta data tags, Bayesian analysis, statistical analysis such as Machine Learning, etc.
Combined with the proprietary algorithms, GTB's AccuMatchTM detection algorithms have virtually zero false positives and a very high resilience to data modifications including:
Excerpting, inserting, file type conversion, formatting, ASCII ->UNICODE conversion, UNIX–Windows conversion, partial data match, and so on.
IT issues often require a personalized solution. With Ask the Experts™, submit your questions to our certified professionals and receive unlimited, customized solutions that work for you.