Vincent D
asked on
PDF selective txt export from single PDF file
Hi,
I routinely get PDF files for review and analysis. What is the easiest way to search and filter for txt that starts with "Path:" and MD5: in my giant PDF report in order to export or extract ALL instances of these txt types i need to export from PDF and put into excel for analysis thru a table in excel?
I routinely get PDF files for review and analysis. What is the easiest way to search and filter for txt that starts with "Path:" and MD5: in my giant PDF report in order to export or extract ALL instances of these txt types i need to export from PDF and put into excel for analysis thru a table in excel?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Looking to export all instances of "Path:" and " MD5:" to CSV or excel to compare duplicates and review each unique instance of paired value for each. Report shows path of file and it's MD5 hash value for each file that made it to report...
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I was able to export txt from PDF to txt as either basic txt, accessible txt or rich txt format. I now need to search txt file to find each pair of Path: and MD5: value. What is the easiest way to scan co considering I don't program or script?
I need to extract each pair of values and export to csv/excel for analysis etc.
I need to extract each pair of values and export to csv/excel for analysis etc.
Post a sample of the text file with a few of the "Path:" and "MD5:" pairs, being careful to replace any private/sensitive text with test data.
Btw, what method/product/technique did you use to export the text from the PDF file?
Btw, what method/product/technique did you use to export the text from the PDF file?
What is the easiest way to scan co considering I don't program or script?What do you mean by "scan co"?
ASKER
"co" was a typo ignore it plz
ASKER
I used Adobe Acrobat to export to txt and rtf
ASKER
Example of what I am trying to get out of txt file....
299.
log3poc.dll
PID(s): 12, 104, 107, 212
Path: c:\yadayada\log3poc.dll
MD5: 123abc4hr8ri4jrjf8fj4jdidj rn (real data is hex value aka 0-9 or A - F)
299.
log3poc.dll
PID(s): 12, 104, 107, 212
Path: c:\yadayada\log3poc.dll
MD5: 123abc4hr8ri4jrjf8fj4jdidj
ASKER
I want to pull out ALL paired instances of Path: and MD5: and export to csv/excel so it will pair up correctly like below. Each path is for file at and of path and MD5 is hash of file in the directly above path
Example
Column 1. COLUMN 2
Path of file1. MD5 hash of file 1
Path of file2. MD5 hash of file 2
Path of file3. MD5 hash of file 3
Path of file4. MD5 hash of file 4
Example
Column 1. COLUMN 2
Path of file1. MD5 hash of file 1
Path of file2. MD5 hash of file 2
Path of file3. MD5 hash of file 3
Path of file4. MD5 hash of file 4
Does the Adobe Acrobat export to TXT always produce the sequence that you show above, i.e., two lines in a row like this:
Path: c:\foldername\filename.fil etype
MD5: 32 hex characters
Path: c:\foldername\filename.fil
MD5: 32 hex characters
ASKER
299. Is not unique and there are likely MANY duplicates so I just want a full export of every pair of "Path:" and "MD5"
ASKER
Yes to my understanding yes
ASKER
Path is sometimes one line long and sometimes 2 lines long but it is still after "Path:"
ASKER
I found a bunch of instances where "MD5:" is not on line directly below "Path:" but a few lines below. I am under assumption that searching txt for "Path:" first and then for next following "MD5:" should work...if that clarifies it better
please post sample data
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I would like to learn how to myself
ASKER
For current project I am open to getting a turn key solution as I have stuff that requires my attention short term. Long term I would like to learn...
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I selected all of the comments that are helpful in solving the problem of searching for specific text in PDF files and then exporting the associated values into CSV/Excel format for analysis. I can't say that any post is the "Best" answer, so I selected the first post as the Accepted Solution and all the others as Assisted Solutions, and I split the points evenly between the two participating experts. Regards, Joe
ASKER