Solved

Extracting information from a PDF

Posted on 2014-12-16
5
103 Views
Last Modified: 2014-12-16
Here’s my dilemma…I have a .pdf file with 8 columns of info [name, phone, email address, etc]. I want to extract all the email addresses. I’m using Nitro to convert to Excel but every row ends up in one cell. I’ve tried saving as .txt and launching the Import text wizard thinking that I could insert column breaks but nothing is aligned properly. When it’s in .xls format the data is aligned pretty well…is there a formula I can use to segregate the info I need? Or another trick?
0
Comment
Question by:CTmountainbiker
  • 3
  • 2
5 Comments
 
LVL 52

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 40502653
I suggest trying the Xpdf utility called pdftotext. If you use the -layout parameter, it should keep the column alignment and then any decent text editor will allow you to copy/paste the email column. Here's an EE 5-minute video Micro Tutorial explaining how to download the Xpdf tools:

http://www.experts-exchange.com/VP_213.html

And another 5-minute one explaining pdftotext specifically:

http://www.experts-exchange.com/VP_217.html

If you have any problems, I'll be happy to help. Regards, Joe
0
 

Author Comment

by:CTmountainbiker
ID: 40502894
Downloaded files no problem; however, I'm trying to get the 'pdftotext.exe".  I extract the files but can't find the executable; something flashes quickly on screen but I'm getting a I/O error when running at the dos prompt.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40502959
There's only one file to download — <xpdfbin-win-3.04.zip>. Unzip it and you'll see a folder called <bin32> (there's also a <bin64> folder, but you don't need it, not even on 64-bit systems). Inside the <bin32> folder you'll find <pdftotext.exe>, which is not an installer — it is simply a stand-alone, command line executable. Open up a command prompt, navigate to wherever <pdftotext.exe> is, and run the command:

pdftotext -layout c:\folder\pdfinput.pdf c:\folder\textoutput.txt

If you don't specify the output file name, it will default to the same name (and path) as the input PDF, but with a file type of TXT. Regards, Joe
0
 

Author Comment

by:CTmountainbiker
ID: 40503110
Thanks very much!  It was my syntax that was messing it up.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40503136
You're very welcome! I'm glad it worked for you. If you already upvoted my video, thanks! If not, I'd really appreciate it if you click on the upvote arrow under Helpful Votes at the video. Thanks much, Joe
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Way to have list of the words 14 101
Problem to setup 18 94
send as works in OWA, but not outlook client 8 107
HxTsr.exe consuming resources 2 342
As with any other System Center product, the installation for the Authoring Tool can be quite a pain sometimes. This article serves to help you avoid making these mistakes and hopefully save you a ton of time on troubleshooting :)  Step 1: Make sur…
We were having a lot of "Heartbeat Alerts" in our SCOM environment, now "Heartbeat" in a SCOM environment for those of you who might not be familiar with SCOM is a packet of data sent from the agent to the management server on a regular basis, basic…
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now