?
Solved

Extracting clean text from a PDF

Posted on 2010-08-18
2
Medium Priority
?
603 Views
Last Modified: 2013-12-17
I have Adobe 9 Pro installed. Can I use the pdf library from its .dlls to access objects from a PDF?
What I want to accomplish is to programmatically (so not manually) get the output as in the attached 2927oc NTL_rulesOnly.txt from 2927oc NTL.pdf.

The PDF has all sorts of special characters that I’d like to skip.

Is there another tool that I can integrate into a Visual Studio project and which can do that? 0181749251.zip
0
Comment
Question by:mihaisz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 8

Accepted Solution

by:
SylvainDrapeau earned 1000 total points
ID: 33471410
Hello !

Look at this here : http://itextpdf.com/

It should do what you need.

Check here for an example : http://www.codeproject.com/KB/cs/PDFToText.aspx

Syldra
0
 
LVL 7

Assisted Solution

by:DanSo1
DanSo1 earned 1000 total points
ID: 33499547
You don't need to use external libraries.
Just use clipboard. Simply programatically open your document in any version of Acrobat Reader, then select all, copy, paste to txt file.
0

Featured Post

Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In a previous article published here at Experts Exchange, Signature Image with Transparent Background (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_12380-Signature-Image-with-Transparent-Background.html), I explained how to cre…
PDF files have been in the limelight due to its unmatched features.  Personal documents, emails, business reports and eBooks are all converted into PDF files owing to peerless features provided by it. Adding watermark to a PDF file is a method to se…
The purpose of this video is to demonstrate how to reset a WordPress password if you are locked out and cannot reset the password. A typical use would be if you cannot access the email to which WordPress would send the password recovery email to…
The purpose of this video is to demonstrate how to prevent comment spam on a WordPress Website. This will be demonstrated using a Windows 8 PC. Plugin Akismet will be used. Go to your WordPress login page. This will look like the following: myw…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question