How to detect text in a pdf?

Posted on 2005-05-10
Last Modified: 2012-06-27
(Follow-on to

Hello fellow experts

I have an Access report that prints out pdf's.   In one section, there is a 'table-like' listing of a column header, with anywhere from one to ten rows.  I'd like to be able to detect the final row in this section, and then create form fields immediately below that.

Is this possible?

Question by:Jim Horn
    LVL 44

    Accepted Solution

    Is there anything below the last row? If not, you can use the "wordfinder" to enumerate all "words" in your document, while keeping track of the box around the "word" (in your case, we are probably not talking real words, but numerical data). Once you are done with the whole page, you can then take the lowest box coordinates, and place your fields just below that. Look into the JavaScript reference for teh methods doc.getPageNthWord, doc.GetPageNthWordQuads and doc.getPageNumWords.

    You would first get the number of words on the page, and then in a loop get the "Quads", which are the coordinates that define the bounding box around the word. And as usual, you can access these methods via the JSObject from your VBA program.
    LVL 65

    Author Comment

    by:Jim Horn
    Okay.  I'll give this a shot...
    LVL 65

    Author Comment

    by:Jim Horn
    I'm going to pass on this solution, as my needs would be for this report, as one page, repeated many times throughout the .pdf.  Thanks for explaining this for me though.


    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    Update 21-May-2015: I temporarily removed the source code to make major changes to the program. Regards, Joe INTRODUCTION This article presents a solution to a question (…
    *Adobe Acrobat 9 was used for this article.  Particular steps may vary depending on software versions. Adobe Acrobat has many, many variables that my be utilized to customize your forms for clarity and ease of use. The Form Editing Tool will be y…
    Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…
    We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

    731 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now