Be seen. Boost your question’s priority for more expert views and faster solutions
d.print(/* potentially some parameters */);
Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.
Have a better answer? Share it in a comment.
Text extraction - or more exactly how successful you will be to extract text reliably - depends highly on the "inner" quality of your PDF files. This does not mean that a file does not look good on screen or when printed, it means that a file may not have all the information that Acrobat needs to extract text. At the end, the characters you see "drawn" on a PDF page are just that, drawings. Acrobat needs a table that allows it to convert these "drawings" back to Unicode characters (usually called the ToUnicode table). If a PDF generator did not include that table in the document for every font used, then you cannot extract text.
Now let's take a look at what support Acrobat's JavaScript has for text extraction. The only way you can get access to the text in a document is by using the "word finder". This is a few API routines in the Doc object that allow you to get the number of words on a page, and then iterate over all words on that page (and potentially get the location for each word).
Take a look at these two API documentation pages for more information about this functionality:
Doc.getPageNumWords(): http://livedocs.adobe.com/acrobat_sdk/11/Acrobat11_HTMLHelp/JS_API_AcroJS.89.494.html
Doc.getPageNthWord(): http://livedocs.adobe.com/acrobat_sdk/11/Acrobat11_HTMLHelp/JS_API_AcroJS.89.492.html
There are some sample scripts in the documentation.
Now to the reasons why this may not work for you: A "word" as far as this word finder is concerned is a collection of letters. As soon as you throw in punctuation marks or numbers, Acrobat will have a problem. So the term "test-abc" will result in two words: test and abc.
The number 100.12 will be reported as 100 and 12 - without any way of knowing if there was anything in between these two words that would connect them to a decimal number.
So, in theory you just have to iterate over all words on a page and see if one of them is "Total" - if you find the first total, just add the current page number to an array of pages that contain the term "Total", and return that array once you've processed all pages.