Convert pdf to free text form for fulltext search
Posted on 2011-09-10
I have a bunch of pdf files that I have stored as large blobs in a database, which seems to be working well for managing them and displaying them. I also want to be able to search the text in the documents using PHP and MySQL to look for keywords or phrases using either the LIKE mysql function or FULLTEXT search. It seems that I would best off converting the PDF blob data into a simple text version with the extracted data in free text form. Is there a PHP class or function that can do this, and can you suggest how I would then go about searching the free text data for a keyword or phrase.
I already have the PDF's in a MySQL database table called pdf_files, with the large blob field simply called pdf.