Link to home
Start Free TrialLog in
Avatar of Bytech India
Bytech India

asked on

convert invoice pdf to XML using artificial intelligence to process different kind of template

Hi

We need to convert the text on an  invoice which can be in the form of image/PDF into XML.

Any third party service which uses artificial intelligence to process different kind of invoice template, is needed.

Please let me know about any tool/service(free/paid) is available for same.
In net I can see many companies providing same but when we actually contact them, they deny.
Please help.

Thanks.
Avatar of David Favor
David Favor
Flag of United States of America image

This will be a multi step process, as follows, if there is a text component in your .pdf file.

1) Convert the .pdf into pure text (no PDF markup).

pdftotext -enc ASCII7 -nopgbrk -layout '$file'

Open in new window


2) Hand edit the file, there tend to be many formatting errors in this process.

3) Convert the pure text to XML. This will require a custom script to reformat this text as XML.

If you have an image only .pdf file (no text component), then locate several previous EE threads about how to run OCR across the image to create a text component, then run steps 1-3 above on the result.

Most .pdf files for invoices will have a text component.
Avatar of Bytech India
Bytech India

ASKER

Thanks David for the quick response. Actually I need the complete process to be automated.
If you have any idea about any third party tool for same, Kindly let me know.

Thnaks.
No.

This is so straight forward + also unique to every specific dataset, unlikely there's any off the shelf tool to do this.

You could easily hire someone to string together the steps above into some sort of tool.

And, you'll still require the manual edit step, to ensure your initial transform from .pdf to .txt actually works correctly.
Ok. I was thinking if we will keep the invoice template in our dataset and iff the input invoice matches same then the complete process will be automatic else human role will come into picture. May be some artificial intelligence come into picture for this for self learning of any new template.
If you have access to the raw data, then you can simplify this entire process.

Just create an XML file from your raw data. This will save you a massive amount of time.

Also, more importantly, you can actually generate an XML file mechanically (no human intervention) if you use raw data.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.