Solved

Extracting clean text from a PDF

Posted on 2010-08-18
2
598 Views
Last Modified: 2013-12-17
I have Adobe 9 Pro installed. Can I use the pdf library from its .dlls to access objects from a PDF?
What I want to accomplish is to programmatically (so not manually) get the output as in the attached 2927oc NTL_rulesOnly.txt from 2927oc NTL.pdf.

The PDF has all sorts of special characters that I’d like to skip.

Is there another tool that I can integrate into a Visual Studio project and which can do that? 0181749251.zip
0
Comment
Question by:mihaisz
2 Comments
 
LVL 8

Accepted Solution

by:
SylvainDrapeau earned 250 total points
ID: 33471410
Hello !

Look at this here : http://itextpdf.com/

It should do what you need.

Check here for an example : http://www.codeproject.com/KB/cs/PDFToText.aspx

Syldra
0
 
LVL 7

Assisted Solution

by:DanSo1
DanSo1 earned 250 total points
ID: 33499547
You don't need to use external libraries.
Just use clipboard. Simply programatically open your document in any version of Acrobat Reader, then select all, copy, paste to txt file.
0

Featured Post

How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Power PDF (http://www.nuance.com/for-business/document-imaging-and-scanning/power-pdf-converter/index.htm) is the newest product from the Document Imaging division of Nuance Communications (http://www.nuance.com/). It is available in two editions — …
Introduction In this tutorial, I'll explain how to create an animated progress meter in a wireframe prototype developed using Axure RP 7.0 - a leading prototyping tool for designing web sites and software. (For more information about Axure and gett…
The purpose of this video is to demonstrate how to reset a WordPress password if you are locked out and cannot reset the password. A typical use would be if you cannot access the email to which WordPress would send the password recovery email to…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question