Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

how to extract the text from pdf using PHP?

Posted on 2009-07-10
2
Medium Priority
?
2,034 Views
Last Modified: 2013-12-13
i have converted the pdf to images, but now i need to extract the content from pdf as text or html, how can i do it.

0
Comment
Question by:Rajmd
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 750 total points
ID: 24823642
This may be either a big undertaking or an impossible dream, depending on what you have got in the PDF file.  You are probably better off to go back to the original data BEFORE it became a PDF.  If you cannot get that information in clear text, here is the path to follow...

You can read the PDF files into PHP with file_get_contents();

You can use var_dump() to print out the data you read from the PDF.

You can visually scan the data string for extraction points and perhaps create a REGEX or a set of explode() statements to pull the information you want.

Do not become too dependent on this technology - different levels of PDF files will have different encodings and you may not be able to control what you will find in there.

Best of luck with your project, ~Ray
0
 
LVL 3

Expert Comment

by:Pedro Chagas
ID: 24825600
What is the goal? Objective?
Where you get the pdf's? You create your own pdf's? If so, I think you can use file_get_contents() (like @ray tells you), because the encoding its always the same!

Regards, JC
0

Featured Post

Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
Microsoft Office Picture Manager is not included in Office 2013. This comes as quite a surprise to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This video expla…
Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question