Link to home
Start Free TrialLog in
Avatar of Michael Machie
Michael MachieFlag for United States of America

asked on

Using Fillable PDFs to populate/ create a CSV or tab-Delimited file -or- submit to DB

Hi,

I have need to create a fillable PDF (which is easy enough to do) and have the data collected in that PDF submitted as a CSV or tab-delimited format.

- This is for sales person Surveys to use at their customer locations monthly.
- The IT department is looking to create a process, with static info and questions, using drop-down lists to choose the answers to each question.
- Once data is collected on this form, staff would need to submit the data either, to be exported to one of the above file types for importing into a separate database, or imported directly into MySQL/ SQL 2012 for report generation.
- Each month the staff will reopen the previous month's form (which will be saved locally before submitting) and make any changes to the answers if they differ from the previous month. This could be over 50 questions with only a single change from previous surveys.
- When all data is collected for that customer for that month, the data will be submitted.
- Each customer would have their name listed in a drop-down to be selected at the time of survey.
- At this time, only current data is required but thinking ahead I am sure we can parse data by date range and would like to know what options are available if a Month was selected via a drop-down

Concept:
- use MySQL and a PDF web-form which will submit that data to the database
- export data from the DB
*or*
- Export the PDF form data to a file

My challenges:
- How to submit the data to MySQL/ SQL2012 from this form
- How to export the info from either the form or the database to a file.

Assumptions (please correct me if I am wrong):
- Need a Web server with the database installed
- Fillable PDF web-forms that IT wants to create rather than purchase a service like Web-Forms

Thanks and please share your thoughts.
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

> How to submit the data to MySQL/ SQL2012 from this form

Can't help you with this one.

> How to export the info from either the form or the database to a file.

I'm a big fan of the PDF Toolkit (PDFtk) from PDF Labs. PDFtk comes in both command line and GUI versions. The command line version is called PDFtk Server and may be downloaded here:
http://www.pdflabs.com/tools/pdftk-server/

Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable (pdftk.exe, with a supporting DLL, libiconv2.dll) that runs on "standard" Windows — it does not have to run on a "server" Windows.

Here's the one line solution using the PDFtk Server command line for exporting the data from the form into a file:

pdftk InFile.pdf generate_fdf output OutFile.fdf

The generate_fdf command reads an input PDF file and generates an FDF file (which, btw, can then be imported to a PDF form using the fill_form command). If you'd like to see the full syntax for the PDFtk Server command line and some usage examples, here are the links:
http://www.pdflabs.com/docs/pdftk-man-page/
http://www.pdflabs.com/docs/pdftk-cli-examples/

As a disclaimer, I want to emphasize that I have no affiliation with this company and no financial interest in it whatsoever. I am simply a happy user/customer. Regards, Joe
Avatar of Michael Machie

ASKER

Thanks Joe, always a pleasure when you get involved.

I will look at that tool again - you had referred me to it a while back. After making this post I found instructions to save form data as a txt file, which can be opened properly in Excel and saved as a CSV. This is a manual method so not ideal and I will look into PDF Toolkit as suggested.

Update to follow.
ASKER CERTIFIED SOLUTION
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Joe, you gave me some good stuff to look at and I'll play with it. E-E is 'encouraging' me to close this so I'll send you the points and will provide a complete update after messing with it for a bit.
best suggestion offered.
Hi Michael,
I'm looking forward to hearing how you do with it. Regards, Joe
I've taken on the task to try this via two methods and I'll choose whichever one works, or works the most efficiently, using a combination of PowerShell and PDFTK.

Method #1:
- Use PDFTK to extract data from fillable PDFs as XML data
- Use an Excel module for PowerShell created by a Microsoft MVP to import all of the separate XML files data into an Excel row for each XML file.
- I have managed to import the data, not cleanly yet, but it is imported, but have not found a way to auto-extract the data from the PDF yet.

Method #2:
- I have built a survey system that will allow manual export of results of the survey as a single XML file.
- Use the Excel PowerShell module to import the XML data into Excel.
- I can manage the XML import to Excel via a script that can be double-clicked to run.

Neither are automated but at least Method #2 is manageable by the average User if I provide them access to export Survey data and give them the script to D-click.

Joe, question: Can I create automated extraction processes with PDFTK to do what I need? Watch a folder and extract XML data from whatever PDF lands in the folder?
> Use PDFTK to extract data from fillable PDFs as XML data

PDFtk does not extract to XML. It extracts to FDF.

> a way to auto-extract the data from the PDF

This is what PDFtk can do, but the extraction is to an FDF file. Since that's a plain text file, it is easy to parse/process. Another approach is to use the Xpdf library to extract from PDF to plain text — specifically, the PDFtoText utility. These two EE 5-minute video Micro Tutorials explain that method.
Xpdf - Command Line Utility for PDF Files - Part 1
Xpdf - Convert PDF Files to Plain Text Files - Part 3

> Can I create automated extraction processes with PDFTK to do what I need?

Again, the extraction with PDFtk is to FDF, not XML.

> Watch a folder and extract XML data from whatever PDF lands in the folder?

PDFtk doesn't have a built-in watched-folder feature, but I have written some custom programs that implement a watched-folder capability and then call PDFtk when files arrive in the watched folder. For example, I wrote a custom program called CreatePDFfromXML™ to do this for another Experts Exchange member, back in the days when the "Hire Me" button existed (he hired me to do it). Attached is the CreatePDFfromXML™ USER GUIDE. The app is tailored to his particular XML elements and PDF fields, as you can see in the table on Page 4. Generalizing the program for any XML elements and any PDF fields is on my to-do list, but I'm not going to have it any time soon, so I've been creating custom versions of it for other clients. However, it's the opposite of what you want — it converts XML to PDF; you want PDF to XML. But I mention it to show that a watched folder approach is certainly doable — read section VI. WATCH MODE. Also, note that a web search for "pdf to xml" turns up plenty of hits, including this one:
http://www.thepdf.com/convert-pdf-to-xml.html

Regards, Joe
CreatePDFfromXML_v1.1_User_Guide.pdf
Rockin' Joe, thanks, you never disappoint. I'll leave an update at some point in the future.
Sounds good — and thanks for the kind words.