Link to home
Start Free TrialLog in
Avatar of Vincent D
Vincent DFlag for United States of America

asked on

PDF selective txt export from single PDF file

Hi,

I routinely get PDF files for review and analysis. What is the easiest way to search and filter for txt that starts with "Path:" and MD5: in my giant PDF report in order to export or extract ALL instances of these txt types i need to export from PDF and put into excel for analysis thru a table in excel?
ASKER CERTIFIED SOLUTION
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Vincent D

ASKER

I do have Adobe Acrobat XI Pro (11.0.23). So how do I use it to export/consume the txt?
Looking to export all instances of "Path:" and " MD5:" to CSV or excel to compare duplicates and review each unique instance of paired value for each. Report shows path of file and it's MD5 hash value for each file that made it to report...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I was able to export txt from PDF to txt as either basic txt, accessible txt or rich txt format. I now need to search txt file to find each pair of Path: and MD5: value. What is the easiest way to scan co considering I don't program or script?

I need to extract each pair of values and export to csv/excel for analysis etc.
Post a sample of the text file with a few of the "Path:" and "MD5:" pairs, being careful to replace any private/sensitive text with test data.

Btw, what method/product/technique did you use to export the text from the PDF file?
What is the easiest way to scan co considering I don't program or script?
What do you mean by "scan co"?
"co" was a typo ignore it plz
I used Adobe Acrobat to export to txt and rtf
Example of what I am trying to get out of txt file....

299.
log3poc.dll
PID(s): 12, 104, 107, 212
Path: c:\yadayada\log3poc.dll
MD5: 123abc4hr8ri4jrjf8fj4jdidjrn (real data is hex value aka 0-9 or A - F)
I want to pull out ALL paired instances of Path: and MD5: and export to csv/excel so it will pair up correctly like below. Each path is for file at and of path and MD5 is hash of file in the directly above path

Example

Column 1.         COLUMN 2
Path of file1.       MD5 hash of file 1
Path of file2.       MD5 hash of file 2
Path of file3.       MD5 hash of file 3
Path of file4.       MD5 hash of file 4
Does the Adobe Acrobat export to TXT always produce the sequence that you show above, i.e., two lines in a row like this:

Path: c:\foldername\filename.filetype
MD5: 32 hex characters
299. Is not unique and there are likely MANY duplicates so I just want a full export of every pair of "Path:" and "MD5"
Yes to my understanding yes
Path is sometimes one line long and sometimes 2 lines long but it is still after "Path:"
I found a bunch of instances where "MD5:" is not on line directly below "Path:" but a few lines below. I am under assumption that searching txt for "Path:" first and then for next following "MD5:" should work...if that clarifies it better
please post sample data
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I would like to learn how to myself
For current project I am open to getting a turn key solution as I have stuff that requires my attention short term. Long term I would like to learn...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I selected all of the comments that are helpful in solving the problem of searching for specific text in PDF files and then exporting the associated values into CSV/Excel format for analysis. I can't say that any post is the "Best" answer, so I selected the first post as the Accepted Solution and all the others as Assisted Solutions, and I split the points evenly between the two participating experts. Regards, Joe