Solved

Cannot convert this pdf to text

Posted on 2014-11-17
22
200 Views
Last Modified: 2014-12-25
Hello
 
I want to convert this pdf to text but I dont understand why I get no correct characters in text mode
 
any help appreciated
 
Thanks in advance
 
LLaurent
Acronyms.pdf
0
Comment
Question by:LLaurent-59
  • 7
  • 5
  • 3
  • +3
22 Comments
 
LVL 62

Accepted Solution

by:
☠ MASQ ☠ earned 250 total points
ID: 40449113
You need to understand that the PDF document isn't displaying text, it's displaying a graphical representation of text.  This looks like a Postscript file that's been converted to a .pdf using Ghostscript.  The "characters" you see on the screen are in Adobe PS Type3 font and this font does not map to unicode so if you cut and paste it you just end up with random characters as you need a Postscript engine to interpret them.

Your quickest conversion would be to use a free OCR package and cut and paste a screenshot to this section as a graphic for conversion to text.

What you see isn't always what you get!
0
 
LVL 91

Expert Comment

by:nobus
ID: 40449147
you can also try this online concerter :  http://convertonlinefree.com/PDFToTXTEN.aspx
0
 

Author Comment

by:LLaurent-59
ID: 40449250
Thanks MASQ,

that's what I imagined, but do you know any free OCR package

Thanks in advance
0
 

Author Comment

by:LLaurent-59
ID: 40449253
Thanks NOBUS

but when I try this online converter : "Text cannot be extracted..."
0
 
LVL 4

Expert Comment

by:Nice-Ghaza
ID: 40449297
Dear Sir , kindly use the This one software  .its solve your issue  [  nitro pro 8 ] Thanks
0
 
LVL 91

Expert Comment

by:nobus
ID: 40449546
0
 

Author Comment

by:LLaurent-59
ID: 40449709
Thanks again NOBUS

but again, this give me an incredible result,
if you can read it or traduce it !!! amazing !!!
Acronyms-zamzar.txt
0
 

Author Comment

by:LLaurent-59
ID: 40449715
Thanks NICE-GHAZA

Nitro Pro is a very sophisticated pdf tool, but expensive too,
I dont want to upload such as big application

I would like to find an easy ocr tool ... as suggested by MASQ
I found a solution using PDFCreator, and next Foxit Phantom PDF which have an ocr tool
0
 
LVL 4

Expert Comment

by:Nice-Ghaza
ID: 40449844
Dear Sir, use any one small  tool
(1)  AbbyyFineReader8

(2) Able2Extract.Professional.v6.0.0.0-NoPE



Thanks
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40449941
PDF-XChange Editor comes in Free and Pro versions:
http://www.tracker-software.com/product/pdf-xchange-editor

When you install it, select Free. Even the Free version has OCR and it handles your document perfectly. I just OCRed it with Accuracy set to High, and Output Type set to Create New Searchable PDF:

PDF-XChange Editor Free OCR
Attached is the PDF that it created. The text copy/pastes perfectly! Regards, Joe
Acronyms-via-PDF-XChange-Editor-OCR.pdf
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40449966
It just occurred to me that your doc has French, so I OCRed it again and selected French as the language:

OCR Language choice
Also, I should have mentioned that the OCR tool is on the Document menu:

OCR menu
Attached is the PDF that it created with French as the OCR language choice. Once again, the text copy/pastes perfectly!

Also, be sure to pick the Free Version when you run the installer:

PDF-XChange Editor - Free or Pro
Regards, Joe
Acronyms-OCR-French.pdf
0
 

Author Comment

by:LLaurent-59
ID: 40450100
Thanks JOE

I am glad to see it works with PDF-XChange

here is a part of a selection-copy-paste of the beginning of file results :
CNRS
CNUCED
CRS
CSFI
DEA
Centre national de la recherche scientifique
Conference des Nations unies sur le commerce et le developpement
Catholic Relief Services
Centre for the Study of Financial Innovation ...

in fact it would be OK if it was the following
CNRS Centre national de la recherche scientifique
CNUCED Conférence des Nations unies sur le commerce et le développement
CRS Catholic Relief Services
CSFI Centre for the Study of Financial Innovation
DEA ...

I obtained it using Foxit PhantomPDF OCR

and I would appreciate to find an easy ocr toll for doing this

...
0
 
LVL 52

Assisted Solution

by:Joe Winograd, EE MVE
Joe Winograd, EE MVE earned 250 total points
ID: 40450183
OK, if the order of the text is important, here's a free way to do it. IrfanView is excellent (free!) imaging software that I've been using for many years:
http://www.irfanview.com/

I recommend the Brothersoft link to download IrfanView:
http://www.brothersoft.com/download-irfanview-6224.html

This will download a single install file called <iview438_setup.exe> with no adware and no junk!

And the Brothersoft link for the PlugIns, which are required for PDF support:
http://www.brothersoft.com/download-irfanview-all-plugins-164981.html

This will download a single install file called <irfanview_plugins_438_setup.exe> with no adware and no junk!

Install IrfanView first, then install the PlugIns. In addition, for OCR capability, there's a separate plug-in for that, also free (download and run it):
http://irfanview.info/plugins/kadmos/setup_kadmos_irfanview_us.exe

I just used IrfanView with the PDF and Kadmos OCR plug-ins on your file (all are free). I also downloaded the French language dictionary for Kadmos, which produced a much better result than the English dictionary. Here's a copy/paste of what it created from your doc:


CNRS             Centre national de la recherche scientifique
CNUCED              Conférence des Nations unies sur le commerce et le développement
CRS                 Catholic Relief Services
CSFI                Centre for the Study ofFinancial Innovation
DEA                   Data Envelopment Analysis
DG EUROPEAID Direction générale de développement et coopération européenne
DID                 Développement international Desjardins
EXCOM                   Executive Committee
FENU            Fonds d'équipement des Nations unies
FERT                        Formation pour l'épanouissement et le ~uveau de la terre
FFP                Fondo Financiero Privado
FIDA             Fonds international de développement agricole
FlNCA                          Foundation for International Community Assistance
FOROLACFR Foro Latinoamericano y del Caribe de Finanzas Rurales
FMO                 Banque néerlandaise de développement
FSLN             Front sandiniste de libération nationale
G20                Group ofthe 20 major economies
GRET                      Groupe de recherche et d'échange technologiques
GRI                Global Reporting Initiative
GTZ                          Gesellschaft fûr Technische Zusammenarbeit
IDS               Institute ofDevelopment Studies
IFI                  Institutions financières internationales
IMF                  Institution de Microfinance


Regards, Joe
0
 
LVL 62

Expert Comment

by:☠ MASQ ☠
ID: 40450750
For a one off OCR job I'd make a screenshot image and use the free OCR built into Google docs (it's the engine they use for their document scanning projects)
https://support.google.com/drive/answer/176692?hl=en

As you've discovered it's pointless simply trying pdf to text tools as they can't translate the Type3 fonts into Unicode you'll just get varying shades of gibberish!
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40471011
Hi LLaurent,
It's been 10 days since I documented a solution for you (using free tools only) in this post <http:#a40450183> and I'm wondering if you've had a chance to try it. It worked perfectly here (as the copy/paste at my last post shows), but if you're having any problems getting it to work, please let me know and I'll try to help you through it. Regards, Joe
0
 
LVL 59

Expert Comment

by:LeeTutor
ID: 40511668
I've requested that this question be deleted for the following reason:

The question has either no comments or not enough useful information to be called an "answer".
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40511669
There is definitely enough useful information to be called an "answer". My solution presented in <http:#a40450183> works perfectly. As the asker requested, it uses all free tools — the free IrfanView, the free PlugIns for PDF support, and the free plug-in for OCR, including its free French language dictionary. I ran the proposed solution on the asker's actual file. It worked perfectly — I even did a copy/paste of the results from the asker's actual file into my post. I'd like to know if anyone can explain how this doesn't exactly and precisely answer the question. Thanks, Joe
0
 
LVL 62

Expert Comment

by:☠ MASQ ☠
ID: 40511797
Agreed

There were two parts to the question - "I want to convert this pdf to text" & "I dont understand why I get no correct characters in text mode"

Looks like both had been completely covered

In fact by #40449715 the asker seems to have found an OCR solution themselves.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40511833
MASQ makes an excellent point about <http:#a40449715>, where the asker seems to say that Nitro Pro's PDF Creator capability answers the question (but is too expensive and too big/sophisticated) and so does Foxit PhantomPDF (also an expensive, big, sophisticated product). That's when the asker (in <http:#a40449250>) made it clear that he wants a free product:
do you know any free OCR package
Regards, Joe
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40512064
Hi eenookami,

I recommend that it be closed by #2, with these specific comment IDs:

(1) http:#a40449113

MASQ had it right when he suggested OCR.

(2) http:#a40450183

I showed a specific solution with all free tools that worked perfectly on the asker's actual document.

Thanks, Joe
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Problem I recently had a lot of trouble with File Explorer hanging on my personal computer running Windows 8.1. It's important to note that this isn't Internet Explorer. This was happening when I attempted to access a local network location where I…
The recent Microsoft changes on update philosophy for Windows pre-10 and their impact on existing WSUS implementations.
The viewer will learn how to successfully download and install the SARDU utility on Windows 8, without downloading adware.
With the advent of Windows 10, Microsoft is pushing a Get Windows 10 icon into the notification area (system tray) of qualifying computers. There are many reasons for wanting to remove this icon. This two-part Experts Exchange video Micro Tutorial s…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now