Solved

Can I extract specific pages that have specific text

Posted on 2014-09-24
5
205 Views
Last Modified: 2014-09-26
I have a large PDF file (3000 pages) and I like to build a new file with a subset of the data based upon the OCR in adobe.
Can i do something like this?
0
Comment
Question by:Scott Johnston
  • 3
  • 2
5 Comments
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40342593
I have begun work on a program that does this. It is based on a discussion following this other EE question:
http://www.experts-exchange.com/Software/Misc/Q_28510119.html

As stated in that question, the new program is a follow-up to the one based on yet another EE question:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

As also stated earlier, it is a violation of EE's Terms of Use/Code of Conduct to offer to sell any goods or services for any commercial purpose. However, it is permissible to contact members at the email address in their profiles, and more recently EE has brought back the Hire Me button in profiles, as well as created a new Messages system (click the envelope icon in the upper right). So if you're interested in pursuing the matter, please contact me via one of the permitted EE mechanisms.

In the meantime, I'd like to understand your requirements better. You say that you want to build a new file with a subset of the data in the 3,000-page file. Some questions about that:

(1) How are you going to identify the subset?

(2) If the answer to (1) is via a text search string, then several more questions, starting with - What kind of search? Single word? Entire phrase?

(3) Case sensitive or case insensitive or an option for either?

(4) Partial word or whole word or an option for either?

(5) Do you want Boolean search (AND, OR, NOT)?

(6) Do you want Regular Expression (RegEx) search?

(7) Anything else that will help to specify your requirements for creating the subset?

Regards, Joe
0
 

Author Comment

by:Scott Johnston
ID: 40345136
I've requested that this question be deleted for the following reason:

Thank you but the answer is you cannot do that, I was not looking for a solution.  I just wanted someone to confirm is adobe has a feature to retrieve a subset of pages from a large PDF file.
0
 
LVL 52

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 40344760
> confirm...adobe has a feature to retrieve a subset of pages from a large PDF file

The answer is dependent on how you want to identify the subset. For example, if you're willing to select each page of the 3,000-page PDF via the standard Windows techniques (Ctrl-left-click and Shift-left-click), then the answer is yes. Select the thumbnails of the pages you want, then right-click on any selected thumbnail, and then left-click Extract Pages from the context menu. However, I was going on the assumption that you don't want to go through the entire 3,000-page file and manually select each page. My assumption is that you wanted to search for an identifying string on each page, such as "Microsoft", and then automatically extract each page with a hit into a new PDF file with that subset of pages. I don't know of a way for Acrobat to do that.

But that's why I asked the questions I did — before providing an answer to a question, it is important to understand the question well, so you'll find that here at Experts Exchange, experts will often ask you questions about your question in order to assist you better.

Btw, you said "adobe" in your initial question and "adobe" in your Delete Request, so to be clear, Adobe Reader cannot Extract Pages — Adobe Acrobat can. Regards, Joe
0
 

Author Closing Comment

by:Scott Johnston
ID: 40345137
I will award you the point because in the very last post you identified that adobe does not have this function.  That is all I wanted to find out.  I appreciate that you may have a solution but that is not what I was asking for.
That is why I wanted to delete the question.
I am aware of many different solutions to my question but as for Adobe acrobat X it does not have this type of search extraction capabilities.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40346435
Thank you for the points — I really appreciate it! But I want to be absolutely certain that we leave this thread with a correct understanding. So I want to make sure we're on the same wavelength with respect to your comment:
...but as for Adobe acrobat X it does not have this type of search extraction capabilities.
To be clear:

Adobe Acrobat X and XI Standard do have the Extract Pages feature via selection of pages
Adobe Acrobat X and XI Professional do have the Extract Pages feature via selection of pages
Adobe Reader X and XI do not have the Extract Pages feature

But I'm still not sure of exactly what you mean by "this type of search extraction capabilities." If you mean to search for a phrase and then automatically extract all pages where the phrase is found into a new PDF, then I'm not aware of any way to do that in the off-the-shelf Adobe Acrobat X or XI, Standard or Professional.

One other thing — you said:
I am aware of many different solutions to my question...
I am not. Please tell me some of the many different solutions to your question. I would like to research them. Thanks very much. Regards, Joe
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Password Protecting PDF's 8 50
How to stream a PDF 7 137
Remove password restriction from a PDF file. 5 118
Adobe Acrobat Pro: how to copy style? 2 104
This article is in response to a question here (http://www.experts-exchange.com/Other/URLs/Q_28283850.html) at Experts Exchange. The Original Poster has a scanned signature and wants to make the background transparent so that the signature may be pl…
In a previous article published here at Experts Exchange, Signature Image with Transparent Background (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_12380-Signature-Image-with-Transparent-Background.html), I explained how to cre…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now