Solved

pdf->ascii convert??

Posted on 2000-05-08
10
756 Views
Last Modified: 2008-03-10
convert an *entire* adobe pdf file into ascii

any software? any technique? in C or any other language.

i know there are some perl modules that parse the informational headers (which are already in ascii).
0
Comment
Question by:eng40490
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 16

Accepted Solution

by:
maneshr earned 50 total points
ID: 2789669
you might want to take a look at PDF2TXT. It is a simple tool (perl script) to extract text from PDF files.

Key Features:

.. Extract Japanese text - JIS, SJIS, EUC and UCS2 encoded strings from
  Japanese PDF files by use of CMap files.
.. Extract document information and bookmark.
.. Support decoding methods:
    ASCIIHexDecode, ASCII85Decode, FlateDecode, LZWDecode
.. Crypto (pdf2txt_X.XX) and non-crypto (pdf2txt_X.XX_no_crypto) version.

Requirements:           UNIX

[common]
.. Perl (>= 5.005_03)    CPAN
.. zlib                  http://www.cdrom.com/pub/infozip/zlib/
.. Compress::Zlib        CPAN
.. uncompress            UNIX uncompress command
.. Jcode                 http://openlab.ring.gr.jp/Jcode/
.. Base85.pl             ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. PDFLZW.pl             ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. PDFEncoding.pl        ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. makeCMap.pl*          ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. aj12.tar.Z            ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/adobe/
.. aj20.tar.Z            ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/adobe/

[+crypto version]
.. MD5                   CPAN
.. RC4.pl                ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/

To get pdf2txt:

  ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/

====================================================
Here is another way to do the same.

PDF Conversion by E-mail

There are three e-mail options you can use to convert PDF
documents to a format that is more accessible to screen reading
software. The e-mail address you use depends on the conversion
format you want, plain (ASCII) text or HTML, and whether the
PDF is on the Internet or local media.

Option 1
If the PDF is on the Internet, you can mail the URL (web address)
of the PDF in the body of an email message to
pdf2txt@adobe.com (for plain text) or to pdf2html@adobe.com
(for HTML). The convertor will mail back the translation of the
PDF file. You can submit multiple URLs in a single e-mail.

Tip: Some URLs are very long and cumbersome to type. Cutting
and pasting the URL into the mail message will save you some
keystrokes.

Option 2
If the PDF is on local media, such as a hard drive, CD-ROM, or
internal server, it can be submitted as a MIME attachment to an
e-mail message. All converted pdf-documents will be sent back
to the sender as MIME attachments. For plain text, mail the
attached PDF to pdf2txt@adobe.com. For HTML, mail the
attached PDF to pdf2html@adobe.com.

Option 3

A service hosted by Trace Research Center also allows you to
convert PDF documents.

You can either mail the URL of the PDF or attach the PDF
document itself to your email message and send it to
pdf2txt@sun.trace.wisc.edu (for plain text) or to
pdf2html@sun.trace.wisc.edu (for HTML). The convertor will
mail back the translation of the PDF file.

Adobe would like to thank Dr. Gregg Vanderheiden and the
Trace Research Center (http://trace.wisc.edu) for helping us host
this service.

For further info ....
http://access.adobe.com/access_email.html
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2797917
were you able to find a solution for your question??

if so, pl let me know of the solution you used?

Thanks
0
 

Author Comment

by:eng40490
ID: 2813196
i tried all the email solutions. they all failed in the same place -- they converted 'Offer' to 'oxxxxer' whre xxxx is some funny character.

seems like that's the state of the art in pdf->txt conversion.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 16

Expert Comment

by:maneshr
ID: 2813973
Hmm!!

Even the adobe site gave you this problem??
did it do this for all chars or some??

let me know
0
 

Author Comment

by:eng40490
ID: 2817470
Global Oering
the O,cial List of the SES

i found 5 instances of the first line and 1 instance of the 2nd. all other occurances of 'F' are converted.

i used pdf2txt@adobe.com, pdf2html@adobe, pdf2txt@sun.trace.wisc.edu.
i used 2 pdf documents.

all suffered from the same problem.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2818123
i would suggest that you post this problem to adobe and bring this to their notice. Who knows, this might be a known bug, due to version compability problem, or something!!

Rgds
0
 

Author Comment

by:eng40490
ID: 2820031
actually only 1 pdf docs sufferred from the problem. i just sent another for conversion and it's ok. so looks like the first pdf document has something unusual.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2821125
was the problematic PDF file created using a different ver. of acrobat than the one which worked fine?

does the problematic PDF file have sp./international characters in it??

pl. let me know.
0
 

Author Comment

by:eng40490
ID: 2822305
can't ask the author/creator of the doc. no special or international character in the problematic *words*. i did not read every word of the doc so can't say about the entire doc.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2822391
sorry, cant think of anything else that might cause the problem. :-(
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question