Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 222
  • Last Modified:

Can I extract specific pages that have specific text

I have a large PDF file (3000 pages) and I like to build a new file with a subset of the data based upon the OCR in adobe.
Can i do something like this?
0
Scott Johnston
Asked:
Scott Johnston
  • 3
  • 2
1 Solution
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
I have begun work on a program that does this. It is based on a discussion following this other EE question:
http://www.experts-exchange.com/Software/Misc/Q_28510119.html

As stated in that question, the new program is a follow-up to the one based on yet another EE question:
http://www.experts-exchange.com/Software/Server_Software/Document_Management/Q_28084148.html

As also stated earlier, it is a violation of EE's Terms of Use/Code of Conduct to offer to sell any goods or services for any commercial purpose. However, it is permissible to contact members at the email address in their profiles, and more recently EE has brought back the Hire Me button in profiles, as well as created a new Messages system (click the envelope icon in the upper right). So if you're interested in pursuing the matter, please contact me via one of the permitted EE mechanisms.

In the meantime, I'd like to understand your requirements better. You say that you want to build a new file with a subset of the data in the 3,000-page file. Some questions about that:

(1) How are you going to identify the subset?

(2) If the answer to (1) is via a text search string, then several more questions, starting with - What kind of search? Single word? Entire phrase?

(3) Case sensitive or case insensitive or an option for either?

(4) Partial word or whole word or an option for either?

(5) Do you want Boolean search (AND, OR, NOT)?

(6) Do you want Regular Expression (RegEx) search?

(7) Anything else that will help to specify your requirements for creating the subset?

Regards, Joe
0
 
Scott JohnstonSystems ConsultantAuthor Commented:
I've requested that this question be deleted for the following reason:

Thank you but the answer is you cannot do that, I was not looking for a solution.  I just wanted someone to confirm is adobe has a feature to retrieve a subset of pages from a large PDF file.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
> confirm...adobe has a feature to retrieve a subset of pages from a large PDF file

The answer is dependent on how you want to identify the subset. For example, if you're willing to select each page of the 3,000-page PDF via the standard Windows techniques (Ctrl-left-click and Shift-left-click), then the answer is yes. Select the thumbnails of the pages you want, then right-click on any selected thumbnail, and then left-click Extract Pages from the context menu. However, I was going on the assumption that you don't want to go through the entire 3,000-page file and manually select each page. My assumption is that you wanted to search for an identifying string on each page, such as "Microsoft", and then automatically extract each page with a hit into a new PDF file with that subset of pages. I don't know of a way for Acrobat to do that.

But that's why I asked the questions I did — before providing an answer to a question, it is important to understand the question well, so you'll find that here at Experts Exchange, experts will often ask you questions about your question in order to assist you better.

Btw, you said "adobe" in your initial question and "adobe" in your Delete Request, so to be clear, Adobe Reader cannot Extract Pages — Adobe Acrobat can. Regards, Joe
0
 
Scott JohnstonSystems ConsultantAuthor Commented:
I will award you the point because in the very last post you identified that adobe does not have this function.  That is all I wanted to find out.  I appreciate that you may have a solution but that is not what I was asking for.
That is why I wanted to delete the question.
I am aware of many different solutions to my question but as for Adobe acrobat X it does not have this type of search extraction capabilities.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Thank you for the points — I really appreciate it! But I want to be absolutely certain that we leave this thread with a correct understanding. So I want to make sure we're on the same wavelength with respect to your comment:
...but as for Adobe acrobat X it does not have this type of search extraction capabilities.
To be clear:

Adobe Acrobat X and XI Standard do have the Extract Pages feature via selection of pages
Adobe Acrobat X and XI Professional do have the Extract Pages feature via selection of pages
Adobe Reader X and XI do not have the Extract Pages feature

But I'm still not sure of exactly what you mean by "this type of search extraction capabilities." If you mean to search for a phrase and then automatically extract all pages where the phrase is found into a new PDF, then I'm not aware of any way to do that in the off-the-shelf Adobe Acrobat X or XI, Standard or Professional.

One other thing — you said:
I am aware of many different solutions to my question...
I am not. Please tell me some of the many different solutions to your question. I would like to research them. Thanks very much. Regards, Joe
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now