What is the goal? Objective?
Where you get the pdf's? You create your own pdf's? If so, I think you can use file_get_contents() (like @ray tells you), because the encoding its always the same!
Regards, JC
Main Topics
Browse All Topicsi have converted the pdf to images, but now i need to extract the content from pdf as text or html, how can i do it.
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Business Accounts
Answer for Membership
by: Ray_PaseurPosted on 2009-07-10 at 07:30:35ID: 24823642
This may be either a big undertaking or an impossible dream, depending on what you have got in the PDF file. You are probably better off to go back to the original data BEFORE it became a PDF. If you cannot get that information in clear text, here is the path to follow...
You can read the PDF files into PHP with file_get_contents();
You can use var_dump() to print out the data you read from the PDF.
You can visually scan the data string for extraction points and perhaps create a REGEX or a set of explode() statements to pull the information you want.
Do not become too dependent on this technology - different levels of PDF files will have different encodings and you may not be able to control what you will find in there.
Best of luck with your project, ~Ray