PDF selective txt export from single PDF file

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

SOLUTION

aikimark

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

I do have Adobe Acrobat XI Pro (11.0.23). So how do I use it to export/consume the txt?

ASKER

Looking to export all instances of "Path:" and " MD5:" to CSV or excel to compare duplicates and review each unique instance of paired value for each. Report shows path of file and it's MD5 hash value for each file that made it to report...

SOLUTION

aikimark

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

I was able to export txt from PDF to txt as either basic txt, accessible txt or rich txt format. I now need to search txt file to find each pair of Path: and MD5: value. What is the easiest way to scan co considering I don't program or script?

I need to extract each pair of values and export to csv/excel for analysis etc.

Post a sample of the text file with a few of the "Path:" and "MD5:" pairs, being careful to replace any private/sensitive text with test data.

Btw, what method/product/technique did you use to export the text from the PDF file?

What is the easiest way to scan co considering I don't program or script?

What do you mean by "scan co"?

ASKER

"co" was a typo ignore it plz

ASKER

I used Adobe Acrobat to export to txt and rtf

ASKER

Example of what I am trying to get out of txt file....

299.
log3poc.dll
PID(s): 12, 104, 107, 212
Path: c:\yadayada\log3poc.dll
MD5: 123abc4hr8ri4jrjf8fj4jdidjrn (real data is hex value aka 0-9 or A - F)

ASKER

I want to pull out ALL paired instances of Path: and MD5: and export to csv/excel so it will pair up correctly like below. Each path is for file at and of path and MD5 is hash of file in the directly above path

Example

Column 1. COLUMN 2
Path of file1. MD5 hash of file 1
Path of file2. MD5 hash of file 2
Path of file3. MD5 hash of file 3
Path of file4. MD5 hash of file 4

Does the Adobe Acrobat export to TXT always produce the sequence that you show above, i.e., two lines in a row like this:

Path: c:\foldername\filename.filetype
MD5: 32 hex characters

ASKER

299. Is not unique and there are likely MANY duplicates so I just want a full export of every pair of "Path:" and "MD5"

ASKER

Yes to my understanding yes

ASKER

Path is sometimes one line long and sometimes 2 lines long but it is still after "Path:"

ASKER

I found a bunch of instances where "MD5:" is not on line directly below "Path:" but a few lines below. I am under assumption that searching txt for "Path:" first and then for next following "MD5:" should work...if that clarifies it better

aikimark

please post sample data

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

I would like to learn how to myself

ASKER

For current project I am open to getting a turn key solution as I have stuff that requires my attention short term. Long term I would like to learn...

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.