Solved

how to extract the text from pdf using PHP?

Posted on 2009-07-10
2
2,003 Views
Last Modified: 2013-12-13
i have converted the pdf to images, but now i need to extract the content from pdf as text or html, how can i do it.

0
Comment
Question by:Rajmd
2 Comments
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 250 total points
ID: 24823642
This may be either a big undertaking or an impossible dream, depending on what you have got in the PDF file.  You are probably better off to go back to the original data BEFORE it became a PDF.  If you cannot get that information in clear text, here is the path to follow...

You can read the PDF files into PHP with file_get_contents();

You can use var_dump() to print out the data you read from the PDF.

You can visually scan the data string for extraction points and perhaps create a REGEX or a set of explode() statements to pull the information you want.

Do not become too dependent on this technology - different levels of PDF files will have different encodings and you may not be able to control what you will find in there.

Best of luck with your project, ~Ray
0
 
LVL 3

Expert Comment

by:Pedro Chagas
ID: 24825600
What is the goal? Objective?
Where you get the pdf's? You create your own pdf's? If so, I think you can use file_get_contents() (like @ray tells you), because the encoding its always the same!

Regards, JC
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Output in PHP throwing alignment of data off issue 12 43
PHP Curl to output a url 7 48
What if i make webbased alternative for MS Access 5 68
PHP Mail error 3 27
These days socially coordinated efforts have turned into a critical requirement for enterprises.
This article discusses how to create an extensible mechanism for linked drop downs.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question