Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1960
  • Last Modified:

Find text in PDF and print the pages it is found on

I want to create an Action in Adobe X Pro where I use Javascript to find all instances in a PDF document where the word "Total" is found and I'd like to print each page it is found on.  

I am working on Windows 7

Thank you.
0
mak345
Asked:
mak345
  • 4
  • 3
1 Solution
 
Karl Heinz KremerCommented:
How much experience do you have with JavaScript and Acrobat?

Text extraction - or more exactly how successful you will be to extract text reliably - depends highly on the "inner" quality of your PDF files. This does not mean that a file does not look good on screen or when printed, it means that a file may not have all the information that Acrobat needs to extract text. At the end, the characters you see "drawn" on a PDF page are just that, drawings. Acrobat needs a table that allows it to convert these "drawings" back to Unicode characters (usually called the ToUnicode table). If a PDF generator did not include that table in the document for every font used, then you cannot extract text.

Now let's take a look at what support Acrobat's JavaScript has for text extraction. The only way you can get access to the text in a document is by using the "word finder". This is a few API routines in the Doc object that allow you to get the number of words on a page, and then iterate over all words on that page (and potentially get the location for each word).

Take a look at these two API documentation pages for more information about this functionality:

Doc.getPageNumWords(): http://livedocs.adobe.com/acrobat_sdk/11/Acrobat11_HTMLHelp/JS_API_AcroJS.89.494.html

Doc.getPageNthWord(): http://livedocs.adobe.com/acrobat_sdk/11/Acrobat11_HTMLHelp/JS_API_AcroJS.89.492.html

There are some sample scripts in the documentation.

Now to the reasons why this may not work for you: A "word" as far as this word finder is concerned is a collection of letters. As soon as you throw in punctuation marks or numbers, Acrobat will have a problem. So the term "test-abc" will result in two words: test and abc.

The number 100.12 will be reported as 100 and 12 - without any way of knowing if there was anything in between these two words that would connect them to a decimal number.

So, in theory you just have to iterate over all words on a page and see if one of them is "Total" - if you find the first total, just add the current page number to an array of pages that contain the term "Total", and return that array once you've processed all pages.
0
 
mak345Author Commented:
Thanks for your response!   I will not run into the scenarios you spoke about regarding dashes and decimal points for what I am trying to accomplish.  I am basically just searching for 1 word, "TOTAL" which will only show up in specific places on the reports I will be running against.

However, I do not have much experience at all with Javascript and Acrobat.  Most of my experience is with VBA and SQL.  I am just very unfamiliar with coding in Javascript.  Could you help me with the code I need to add the page numbers to an array, and then to print them?  

I would really appreciate your help; it would save my department a ton of time performing this tedious task.  

Thanks again!
0
 
Karl Heinz KremerCommented:
Unfortunately, that would take a bit more time than I have right now. If you want to dive into JavaScript yourself, it's not too complicated to pick up if you are already familiar with programming in general.
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
Karl Heinz KremerCommented:
Actually, I did find some time. Take a look here:

http://khkonsulting.com/2014/04/extract-pdf-pages-based-content/
0
 
mak345Author Commented:
That is a lot of help.  Thank you!  Just my last step is to print that newly created document.  If I add the line to print, it wants to print the original document, not the new one.

Does this need to be 2 seperate actions altogether or can it be combined into 1?

Thanks
0
 
Karl Heinz KremerCommented:
The second document does not exist as far as the action is concerned. You would either need to print via JavaScript by adding a line

    d.print(/* potentially some parameters */);

Open in new window


You can learn more about the print command in Acrobat's JavaScript API documentation:

http://livedocs.adobe.com/acrobat_sdk/11/Acrobat11_HTMLHelp/JS_API_AcroJS.89.517.html
0
 
mak345Author Commented:
Perfect!  Thanks for all your help.  I really appreciate it!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now