Link to home
Start Free TrialLog in
Avatar of beer9
beer9Flag for India

asked on

How to read and parse a pdf file using perl ?

Hello, I have a pdf file which has some tabular format data. I would like to read it using the perl and parse it and get the appropriate field and put in a text file. Could some one suggest me how can I achieve it and show me some working example. Thanks!
Avatar of Bryan Butler
Bryan Butler
Flag of United States of America image

The issues here is that conversion of PDFs to text is complicated.  Especially with tabular data.  Things may have changed in 2 years, but I doubt it.  I did this 2 years back and found TEXTfromPDF to be the best tool, and they has a "command line" which I needed.  I tried Perl and other languages and COTS tools.  This was the only one that got close to interpreting the tables I was using.  BUT, this was my situation and it was tables generarted from a report writer tool (crystal).  I'm remember testing with other PDFs that didn't have tables and the Perl module and other software did a good job.

So, if you can't get it good enough, you may want to by TEXTfromPDF ($150).
And I don't work for them ;)  That's the "developer" license which has the command line.  The GUI version is $45.  Also, what code exactly do you want?  The "reading", "parsing", or "outputing"?  All of the above would bring up questions such as, what do you need to parse for, what format do you need the output in (ie layout), etc.  
ASKER CERTIFIED SOLUTION
Avatar of Nem Schlecht
Nem Schlecht
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Adam314
Adam314

I've used the CAM::PDF module before.  Depending on how the data is layed out, with this module, it can be straightforward or not.
There's also online free ones too:
http://www.freepdfconvert.com/
http://www.pdfonline.com/convert-pdf/

I'd try a few different ones and see if any are getting close to what you need.  If not, then try some of the COTS ones.  There's also a good free tool for comparing text tables after you convert.  I can try to find it if you want.
Avatar of beer9

ASKER

Thank you :)