We help IT Professionals succeed at work.

Document Template Matching  Using Artificial Intelligence, Machine Learning or Something Else

Hi All,
I need to do a document image template matching with the original document image. Is it possible using Artificial Intelligence or Machine Learning?

Basically, we will get a filled document image as an input. We need to match it with the already existing documents image based on template matching and return the identification of the matched document image.

Please help.

Thanks In advance.

Regards,
Kavita Choudhary.
Comment
Watch Question

CERTIFIED EXPERT

Commented:
If the document is coming to you as an image, I think you'd want to use some form of OCR to convert into into text (as best you can) and then make the comparison based on the text content of the document.

Once you have a text document that are a myriad of tools that can do that comparison - some paid, some open source.

E.g. You could look at the tools used to detect plagiarism, which of course is based on comparing how similar two documents are and if they share common content:
https://elearningindustry.com/top-10-free-plagiarism-detection-tools-for-teachers

Hope that helps,

Doug

Author

Commented:
Hi @Doug,

Thanks for the Reply.
Well I need to compare the template based on format not text so don't think above will help.

Please advice something using which I an compare the template format within the two docs.

Thanks,
CERTIFIED EXPERT

Commented:
So are you looking to compare the two images and see how similar they are, without regard for the text of the documents?

E.g. If you had two images from an old, yellowed book and two images from a newly printed book - you'd want the yellowed pages to be seen as similar, even if the newly printed book and the old yellowed book were the same text?

Author

Commented:
Suppose I have 10 sets of invoice images each containing specific one format/template of invoice.
I want to use Machine learning so that it learns the format of invoices among each set.
and based on that it does the template matching and classify the input image.
CERTIFIED EXPERT

Commented:
OK if you to compare documents without doing OCR explicitly, perhaps check out Google Cloud's document understanding APIs:
https://cloud.google.com/solutions/document-understanding/

That sounds like what you may actually want.  They do also offer OCR, so you can try that too to see if that simplifies the problem.
using CNN multi-class model I am able to achieve same.