CTmountainbiker
asked on
Extracting information from a PDF
Here’s my dilemma…I have a .pdf file with 8 columns of info [name, phone, email address, etc]. I want to extract all the email addresses. I’m using Nitro to convert to Excel but every row ends up in one cell. I’ve tried saving as .txt and launching the Import text wizard thinking that I could insert column breaks but nothing is aligned properly. When it’s in .xls format the data is aligned pretty well…is there a formula I can use to segregate the info I need? Or another trick?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
There's only one file to download — <xpdfbin-win-3.04.zip>. Unzip it and you'll see a folder called <bin32> (there's also a <bin64> folder, but you don't need it, not even on 64-bit systems). Inside the <bin32> folder you'll find <pdftotext.exe>, which is not an installer — it is simply a stand-alone, command line executable. Open up a command prompt, navigate to wherever <pdftotext.exe> is, and run the command:
pdftotext -layout c:\folder\pdfinput.pdf c:\folder\textoutput.txt
If you don't specify the output file name, it will default to the same name (and path) as the input PDF, but with a file type of TXT. Regards, Joe
pdftotext -layout c:\folder\pdfinput.pdf c:\folder\textoutput.txt
If you don't specify the output file name, it will default to the same name (and path) as the input PDF, but with a file type of TXT. Regards, Joe
ASKER
Thanks very much! It was my syntax that was messing it up.
You're very welcome! I'm glad it worked for you. If you already upvoted my video, thanks! If not, I'd really appreciate it if you click on the upvote arrow under Helpful Votes at the video. Thanks much, Joe
ASKER