Solved

pdf->ascii convert??

Posted on 2000-05-08
10
752 Views
Last Modified: 2008-03-10
convert an *entire* adobe pdf file into ascii

any software? any technique? in C or any other language.

i know there are some perl modules that parse the informational headers (which are already in ascii).
0
Comment
Question by:eng40490
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 16

Accepted Solution

by:
maneshr earned 50 total points
ID: 2789669
you might want to take a look at PDF2TXT. It is a simple tool (perl script) to extract text from PDF files.

Key Features:

.. Extract Japanese text - JIS, SJIS, EUC and UCS2 encoded strings from
  Japanese PDF files by use of CMap files.
.. Extract document information and bookmark.
.. Support decoding methods:
    ASCIIHexDecode, ASCII85Decode, FlateDecode, LZWDecode
.. Crypto (pdf2txt_X.XX) and non-crypto (pdf2txt_X.XX_no_crypto) version.

Requirements:           UNIX

[common]
.. Perl (>= 5.005_03)    CPAN
.. zlib                  http://www.cdrom.com/pub/infozip/zlib/
.. Compress::Zlib        CPAN
.. uncompress            UNIX uncompress command
.. Jcode                 http://openlab.ring.gr.jp/Jcode/
.. Base85.pl             ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. PDFLZW.pl             ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. PDFEncoding.pl        ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. makeCMap.pl*          ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/
.. aj12.tar.Z            ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/adobe/
.. aj20.tar.Z            ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/adobe/

[+crypto version]
.. MD5                   CPAN
.. RC4.pl                ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/

To get pdf2txt:

  ftp://www.isl.intec.co.jp/pub/person/ishida/freeware/pdf2txt/

====================================================
Here is another way to do the same.

PDF Conversion by E-mail

There are three e-mail options you can use to convert PDF
documents to a format that is more accessible to screen reading
software. The e-mail address you use depends on the conversion
format you want, plain (ASCII) text or HTML, and whether the
PDF is on the Internet or local media.

Option 1
If the PDF is on the Internet, you can mail the URL (web address)
of the PDF in the body of an email message to
pdf2txt@adobe.com (for plain text) or to pdf2html@adobe.com
(for HTML). The convertor will mail back the translation of the
PDF file. You can submit multiple URLs in a single e-mail.

Tip: Some URLs are very long and cumbersome to type. Cutting
and pasting the URL into the mail message will save you some
keystrokes.

Option 2
If the PDF is on local media, such as a hard drive, CD-ROM, or
internal server, it can be submitted as a MIME attachment to an
e-mail message. All converted pdf-documents will be sent back
to the sender as MIME attachments. For plain text, mail the
attached PDF to pdf2txt@adobe.com. For HTML, mail the
attached PDF to pdf2html@adobe.com.

Option 3

A service hosted by Trace Research Center also allows you to
convert PDF documents.

You can either mail the URL of the PDF or attach the PDF
document itself to your email message and send it to
pdf2txt@sun.trace.wisc.edu (for plain text) or to
pdf2html@sun.trace.wisc.edu (for HTML). The convertor will
mail back the translation of the PDF file.

Adobe would like to thank Dr. Gregg Vanderheiden and the
Trace Research Center (http://trace.wisc.edu) for helping us host
this service.

For further info ....
http://access.adobe.com/access_email.html
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2797917
were you able to find a solution for your question??

if so, pl let me know of the solution you used?

Thanks
0
 

Author Comment

by:eng40490
ID: 2813196
i tried all the email solutions. they all failed in the same place -- they converted 'Offer' to 'oxxxxer' whre xxxx is some funny character.

seems like that's the state of the art in pdf->txt conversion.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 16

Expert Comment

by:maneshr
ID: 2813973
Hmm!!

Even the adobe site gave you this problem??
did it do this for all chars or some??

let me know
0
 

Author Comment

by:eng40490
ID: 2817470
Global Oering
the O,cial List of the SES

i found 5 instances of the first line and 1 instance of the 2nd. all other occurances of 'F' are converted.

i used pdf2txt@adobe.com, pdf2html@adobe, pdf2txt@sun.trace.wisc.edu.
i used 2 pdf documents.

all suffered from the same problem.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2818123
i would suggest that you post this problem to adobe and bring this to their notice. Who knows, this might be a known bug, due to version compability problem, or something!!

Rgds
0
 

Author Comment

by:eng40490
ID: 2820031
actually only 1 pdf docs sufferred from the problem. i just sent another for conversion and it's ok. so looks like the first pdf document has something unusual.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2821125
was the problematic PDF file created using a different ver. of acrobat than the one which worked fine?

does the problematic PDF file have sp./international characters in it??

pl. let me know.
0
 

Author Comment

by:eng40490
ID: 2822305
can't ask the author/creator of the doc. no special or international character in the problematic *words*. i did not read every word of the doc so can't say about the entire doc.
0
 
LVL 16

Expert Comment

by:maneshr
ID: 2822391
sorry, cant think of anything else that might cause the problem. :-(
0

Featured Post

Enroll in May's Course of the Month

May’s Course of the Month is now available! Experts Exchange’s Premium Members and Team Accounts have access to a complimentary course each month as part of their membership—an extra way to increase training and boost professional development.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
perl to convert excel to csv 3 344
Perl DBI Transactions Using Custom Module 7 49
Move Function in Perl Script 2 86
How to get all the API from website? 11 161
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question