Solved

Extracting information from a PDF

Posted on 2014-12-16
5
105 Views
Last Modified: 2014-12-16
Here’s my dilemma…I have a .pdf file with 8 columns of info [name, phone, email address, etc]. I want to extract all the email addresses. I’m using Nitro to convert to Excel but every row ends up in one cell. I’ve tried saving as .txt and launching the Import text wizard thinking that I could insert column breaks but nothing is aligned properly. When it’s in .xls format the data is aligned pretty well…is there a formula I can use to segregate the info I need? Or another trick?
0
Comment
Question by:CTmountainbiker
  • 3
  • 2
5 Comments
 
LVL 53

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 40502653
I suggest trying the Xpdf utility called pdftotext. If you use the -layout parameter, it should keep the column alignment and then any decent text editor will allow you to copy/paste the email column. Here's an EE 5-minute video Micro Tutorial explaining how to download the Xpdf tools:

http://www.experts-exchange.com/VP_213.html

And another 5-minute one explaining pdftotext specifically:

http://www.experts-exchange.com/VP_217.html

If you have any problems, I'll be happy to help. Regards, Joe
0
 

Author Comment

by:CTmountainbiker
ID: 40502894
Downloaded files no problem; however, I'm trying to get the 'pdftotext.exe".  I extract the files but can't find the executable; something flashes quickly on screen but I'm getting a I/O error when running at the dos prompt.
0
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 40502959
There's only one file to download — <xpdfbin-win-3.04.zip>. Unzip it and you'll see a folder called <bin32> (there's also a <bin64> folder, but you don't need it, not even on 64-bit systems). Inside the <bin32> folder you'll find <pdftotext.exe>, which is not an installer — it is simply a stand-alone, command line executable. Open up a command prompt, navigate to wherever <pdftotext.exe> is, and run the command:

pdftotext -layout c:\folder\pdfinput.pdf c:\folder\textoutput.txt

If you don't specify the output file name, it will default to the same name (and path) as the input PDF, but with a file type of TXT. Regards, Joe
0
 

Author Comment

by:CTmountainbiker
ID: 40503110
Thanks very much!  It was my syntax that was messing it up.
0
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 40503136
You're very welcome! I'm glad it worked for you. If you already upvoted my video, thanks! If not, I'd really appreciate it if you click on the upvote arrow under Helpful Votes at the video. Thanks much, Joe
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Doc'in system (example?) BA 7 115
How and where to get Business Intelligence Development Studio? 2 93
Batch convert .doc to .docx 13 588
Numerous files can not be opened anymore 17 68
This very simple solution applies to a narrow cross-section of the "needs to close" variety. In this case, the full message in Event Viewer was in applog, Event ID 1000: Faulting application iexplore.exe, version 8.0.6001.18702, faulting module …
The System Center Operations Manager 2012, known as SCOM, is a part of the Microsoft system center product that provides the user with infrastructure monitoring and application performance monitoring. SCOM monitors:   Windows or UNIX/LinuxNetwo…
The viewer will learn how to simulate a series of sales calls dependent on a single skill level and learn how to simulate a series of sales calls dependent on two skill levels. Simulating Independent Sales Calls: Enter .75 into cell C2 – “skill leve…
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question