Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Can I extract specific pages that have specific text

Posted on 2014-09-24
5
207 Views
Last Modified: 2014-09-26
I have a large PDF file (3000 pages) and I like to build a new file with a subset of the data based upon the OCR in adobe.
Can i do something like this?
0
Comment
Question by:Scott Johnston
  • 3
  • 2
5 Comments
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 40342593
I have begun work on a program that does this. It is based on a discussion following this other EE question:
http://www.experts-exchange.com/Software/Misc/Q_28510119.html

As stated in that question, the new program is a follow-up to the one based on yet another EE question:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

As also stated earlier, it is a violation of EE's Terms of Use/Code of Conduct to offer to sell any goods or services for any commercial purpose. However, it is permissible to contact members at the email address in their profiles, and more recently EE has brought back the Hire Me button in profiles, as well as created a new Messages system (click the envelope icon in the upper right). So if you're interested in pursuing the matter, please contact me via one of the permitted EE mechanisms.

In the meantime, I'd like to understand your requirements better. You say that you want to build a new file with a subset of the data in the 3,000-page file. Some questions about that:

(1) How are you going to identify the subset?

(2) If the answer to (1) is via a text search string, then several more questions, starting with - What kind of search? Single word? Entire phrase?

(3) Case sensitive or case insensitive or an option for either?

(4) Partial word or whole word or an option for either?

(5) Do you want Boolean search (AND, OR, NOT)?

(6) Do you want Regular Expression (RegEx) search?

(7) Anything else that will help to specify your requirements for creating the subset?

Regards, Joe
0
 

Author Comment

by:Scott Johnston
ID: 40345136
I've requested that this question be deleted for the following reason:

Thank you but the answer is you cannot do that, I was not looking for a solution.  I just wanted someone to confirm is adobe has a feature to retrieve a subset of pages from a large PDF file.
0
 
LVL 53

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 40344760
> confirm...adobe has a feature to retrieve a subset of pages from a large PDF file

The answer is dependent on how you want to identify the subset. For example, if you're willing to select each page of the 3,000-page PDF via the standard Windows techniques (Ctrl-left-click and Shift-left-click), then the answer is yes. Select the thumbnails of the pages you want, then right-click on any selected thumbnail, and then left-click Extract Pages from the context menu. However, I was going on the assumption that you don't want to go through the entire 3,000-page file and manually select each page. My assumption is that you wanted to search for an identifying string on each page, such as "Microsoft", and then automatically extract each page with a hit into a new PDF file with that subset of pages. I don't know of a way for Acrobat to do that.

But that's why I asked the questions I did — before providing an answer to a question, it is important to understand the question well, so you'll find that here at Experts Exchange, experts will often ask you questions about your question in order to assist you better.

Btw, you said "adobe" in your initial question and "adobe" in your Delete Request, so to be clear, Adobe Reader cannot Extract Pages — Adobe Acrobat can. Regards, Joe
0
 

Author Closing Comment

by:Scott Johnston
ID: 40345137
I will award you the point because in the very last post you identified that adobe does not have this function.  That is all I wanted to find out.  I appreciate that you may have a solution but that is not what I was asking for.
That is why I wanted to delete the question.
I am aware of many different solutions to my question but as for Adobe acrobat X it does not have this type of search extraction capabilities.
0
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 40346435
Thank you for the points — I really appreciate it! But I want to be absolutely certain that we leave this thread with a correct understanding. So I want to make sure we're on the same wavelength with respect to your comment:
...but as for Adobe acrobat X it does not have this type of search extraction capabilities.
To be clear:

Adobe Acrobat X and XI Standard do have the Extract Pages feature via selection of pages
Adobe Acrobat X and XI Professional do have the Extract Pages feature via selection of pages
Adobe Reader X and XI do not have the Extract Pages feature

But I'm still not sure of exactly what you mean by "this type of search extraction capabilities." If you mean to search for a phrase and then automatically extract all pages where the phrase is found into a new PDF, then I'm not aware of any way to do that in the off-the-shelf Adobe Acrobat X or XI, Standard or Professional.

One other thing — you said:
I am aware of many different solutions to my question...
I am not. Please tell me some of the many different solutions to your question. I would like to research them. Thanks very much. Regards, Joe
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Creating a Word Form 3 65
Convert JPG to PDF file 4 98
SharePoint 2013 open PDF in adobe 5 574
difference between software pdfcomplete and acrobat professional 2 35
Acrobat’s JavaScript is a great tool to extend the application, or to automate recurring tasks. There are several ways a JavaScript can be added to the application or a document (e.g. folder level scripts, validation scripts, event handling scripts,…
The Adobe PDF proprietary file format is recognized as secure and formulated. But these PDF files are also prone to corruption and any external threat like virus attacks, improper storage can hit PDF file integrity.This type of damages can make cruc…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question