Solved

Cannot convert this pdf to text

Posted on 2014-11-17
22
192 Views
Last Modified: 2014-12-25
Hello
 
I want to convert this pdf to text but I dont understand why I get no correct characters in text mode
 
any help appreciated
 
Thanks in advance
 
LLaurent
Acronyms.pdf
0
Comment
Question by:LLaurent-59
  • 7
  • 5
  • 3
  • +3
22 Comments
 
LVL 62

Accepted Solution

by:
☠ MASQ ☠ earned 250 total points
ID: 40449113
You need to understand that the PDF document isn't displaying text, it's displaying a graphical representation of text.  This looks like a Postscript file that's been converted to a .pdf using Ghostscript.  The "characters" you see on the screen are in Adobe PS Type3 font and this font does not map to unicode so if you cut and paste it you just end up with random characters as you need a Postscript engine to interpret them.

Your quickest conversion would be to use a free OCR package and cut and paste a screenshot to this section as a graphic for conversion to text.

What you see isn't always what you get!
0
 
LVL 91

Expert Comment

by:nobus
ID: 40449147
you can also try this online concerter :  http://convertonlinefree.com/PDFToTXTEN.aspx
0
 

Author Comment

by:LLaurent-59
ID: 40449250
Thanks MASQ,

that's what I imagined, but do you know any free OCR package

Thanks in advance
0
 

Author Comment

by:LLaurent-59
ID: 40449253
Thanks NOBUS

but when I try this online converter : "Text cannot be extracted..."
0
 
LVL 4

Expert Comment

by:Nice-Ghaza
ID: 40449297
Dear Sir , kindly use the This one software  .its solve your issue  [  nitro pro 8 ] Thanks
0
 
LVL 91

Expert Comment

by:nobus
ID: 40449546
0
 

Author Comment

by:LLaurent-59
ID: 40449709
Thanks again NOBUS

but again, this give me an incredible result,
if you can read it or traduce it !!! amazing !!!
Acronyms-zamzar.txt
0
 

Author Comment

by:LLaurent-59
ID: 40449715
Thanks NICE-GHAZA

Nitro Pro is a very sophisticated pdf tool, but expensive too,
I dont want to upload such as big application

I would like to find an easy ocr tool ... as suggested by MASQ
I found a solution using PDFCreator, and next Foxit Phantom PDF which have an ocr tool
0
 
LVL 4

Expert Comment

by:Nice-Ghaza
ID: 40449844
Dear Sir, use any one small  tool
(1)  AbbyyFineReader8

(2) Able2Extract.Professional.v6.0.0.0-NoPE



Thanks
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40449941
PDF-XChange Editor comes in Free and Pro versions:
http://www.tracker-software.com/product/pdf-xchange-editor

When you install it, select Free. Even the Free version has OCR and it handles your document perfectly. I just OCRed it with Accuracy set to High, and Output Type set to Create New Searchable PDF:

PDF-XChange Editor Free OCR
Attached is the PDF that it created. The text copy/pastes perfectly! Regards, Joe
Acronyms-via-PDF-XChange-Editor-OCR.pdf
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40449966
It just occurred to me that your doc has French, so I OCRed it again and selected French as the language:

OCR Language choice
Also, I should have mentioned that the OCR tool is on the Document menu:

OCR menu
Attached is the PDF that it created with French as the OCR language choice. Once again, the text copy/pastes perfectly!

Also, be sure to pick the Free Version when you run the installer:

PDF-XChange Editor - Free or Pro
Regards, Joe
Acronyms-OCR-French.pdf
0
 

Author Comment

by:LLaurent-59
ID: 40450100
Thanks JOE

I am glad to see it works with PDF-XChange

here is a part of a selection-copy-paste of the beginning of file results :
CNRS
CNUCED
CRS
CSFI
DEA
Centre national de la recherche scientifique
Conference des Nations unies sur le commerce et le developpement
Catholic Relief Services
Centre for the Study of Financial Innovation ...

in fact it would be OK if it was the following
CNRS Centre national de la recherche scientifique
CNUCED Conférence des Nations unies sur le commerce et le développement
CRS Catholic Relief Services
CSFI Centre for the Study of Financial Innovation
DEA ...

I obtained it using Foxit PhantomPDF OCR

and I would appreciate to find an easy ocr toll for doing this

...
0
 
LVL 51

Assisted Solution

by:Joe Winograd, EE MVE
Joe Winograd, EE MVE earned 250 total points
ID: 40450183
OK, if the order of the text is important, here's a free way to do it. IrfanView is excellent (free!) imaging software that I've been using for many years:
http://www.irfanview.com/

I recommend the Brothersoft link to download IrfanView:
http://www.brothersoft.com/download-irfanview-6224.html

This will download a single install file called <iview438_setup.exe> with no adware and no junk!

And the Brothersoft link for the PlugIns, which are required for PDF support:
http://www.brothersoft.com/download-irfanview-all-plugins-164981.html

This will download a single install file called <irfanview_plugins_438_setup.exe> with no adware and no junk!

Install IrfanView first, then install the PlugIns. In addition, for OCR capability, there's a separate plug-in for that, also free (download and run it):
http://irfanview.info/plugins/kadmos/setup_kadmos_irfanview_us.exe

I just used IrfanView with the PDF and Kadmos OCR plug-ins on your file (all are free). I also downloaded the French language dictionary for Kadmos, which produced a much better result than the English dictionary. Here's a copy/paste of what it created from your doc:


CNRS             Centre national de la recherche scientifique
CNUCED              Conférence des Nations unies sur le commerce et le développement
CRS                 Catholic Relief Services
CSFI                Centre for the Study ofFinancial Innovation
DEA                   Data Envelopment Analysis
DG EUROPEAID Direction générale de développement et coopération européenne
DID                 Développement international Desjardins
EXCOM                   Executive Committee
FENU            Fonds d'équipement des Nations unies
FERT                        Formation pour l'épanouissement et le ~uveau de la terre
FFP                Fondo Financiero Privado
FIDA             Fonds international de développement agricole
FlNCA                          Foundation for International Community Assistance
FOROLACFR Foro Latinoamericano y del Caribe de Finanzas Rurales
FMO                 Banque néerlandaise de développement
FSLN             Front sandiniste de libération nationale
G20                Group ofthe 20 major economies
GRET                      Groupe de recherche et d'échange technologiques
GRI                Global Reporting Initiative
GTZ                          Gesellschaft fûr Technische Zusammenarbeit
IDS               Institute ofDevelopment Studies
IFI                  Institutions financières internationales
IMF                  Institution de Microfinance


Regards, Joe
0
 
LVL 62

Expert Comment

by:☠ MASQ ☠
ID: 40450750
For a one off OCR job I'd make a screenshot image and use the free OCR built into Google docs (it's the engine they use for their document scanning projects)
https://support.google.com/drive/answer/176692?hl=en

As you've discovered it's pointless simply trying pdf to text tools as they can't translate the Type3 fonts into Unicode you'll just get varying shades of gibberish!
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40471011
Hi LLaurent,
It's been 10 days since I documented a solution for you (using free tools only) in this post <http:#a40450183> and I'm wondering if you've had a chance to try it. It worked perfectly here (as the copy/paste at my last post shows), but if you're having any problems getting it to work, please let me know and I'll try to help you through it. Regards, Joe
0
 
LVL 59

Expert Comment

by:LeeTutor
ID: 40511668
I've requested that this question be deleted for the following reason:

The question has either no comments or not enough useful information to be called an "answer".
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40511669
There is definitely enough useful information to be called an "answer". My solution presented in <http:#a40450183> works perfectly. As the asker requested, it uses all free tools — the free IrfanView, the free PlugIns for PDF support, and the free plug-in for OCR, including its free French language dictionary. I ran the proposed solution on the asker's actual file. It worked perfectly — I even did a copy/paste of the results from the asker's actual file into my post. I'd like to know if anyone can explain how this doesn't exactly and precisely answer the question. Thanks, Joe
0
 
LVL 62

Expert Comment

by:☠ MASQ ☠
ID: 40511797
Agreed

There were two parts to the question - "I want to convert this pdf to text" & "I dont understand why I get no correct characters in text mode"

Looks like both had been completely covered

In fact by #40449715 the asker seems to have found an OCR solution themselves.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40511833
MASQ makes an excellent point about <http:#a40449715>, where the asker seems to say that Nitro Pro's PDF Creator capability answers the question (but is too expensive and too big/sophisticated) and so does Foxit PhantomPDF (also an expensive, big, sophisticated product). That's when the asker (in <http:#a40449250>) made it clear that he wants a free product:
do you know any free OCR package
Regards, Joe
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40512064
Hi eenookami,

I recommend that it be closed by #2, with these specific comment IDs:

(1) http:#a40449113

MASQ had it right when he suggested OCR.

(2) http:#a40450183

I showed a specific solution with all free tools that worked perfectly on the asker's actual document.

Thanks, Joe
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

With the recent demise of Windows XP support, you may be a new convert to Windows 7 or Windows 8. Or perhaps you've been on W7 or W8 for a while, but just recently acquired your first scanner for the new OS. In either case, you are likely to be very…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
In this video, we discuss why the need for additional vertical screen space has become more important in recent years, namely, due to the transition in the marketplace of 4x3 computer screens to 16x9 and 16x10 screens (so-called widescreen format). …
The Task Scheduler is a powerful tool that is built into Windows. It allows you to schedule tasks (actions) on a recurring basis, such as hourly, daily, weekly, monthly, at log on, at startup, on idle, etc. This video Micro Tutorial is a brief intro…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now