[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 608
  • Last Modified:

Extracting clean text from a PDF

I have Adobe 9 Pro installed. Can I use the pdf library from its .dlls to access objects from a PDF?
What I want to accomplish is to programmatically (so not manually) get the output as in the attached 2927oc NTL_rulesOnly.txt from 2927oc NTL.pdf.

The PDF has all sorts of special characters that I’d like to skip.

Is there another tool that I can integrate into a Visual Studio project and which can do that? 0181749251.zip
0
mihaisz
Asked:
mihaisz
2 Solutions
 
SylvainDrapeauCommented:
Hello !

Look at this here : http://itextpdf.com/

It should do what you need.

Check here for an example : http://www.codeproject.com/KB/cs/PDFToText.aspx

Syldra
0
 
DanSo1Commented:
You don't need to use external libraries.
Just use clipboard. Simply programatically open your document in any version of Acrobat Reader, then select all, copy, paste to txt file.
0

Featured Post

[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now