Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 271
  • Last Modified:

Cannot convert this pdf to text

Hello
 
I want to convert this pdf to text but I dont understand why I get no correct characters in text mode
 
any help appreciated
 
Thanks in advance
 
LLaurent
Acronyms.pdf
0
LLaurent-59
Asked:
LLaurent-59
  • 7
  • 5
  • 3
  • +3
2 Solutions
 
☠ MASQ ☠Commented:
You need to understand that the PDF document isn't displaying text, it's displaying a graphical representation of text.  This looks like a Postscript file that's been converted to a .pdf using Ghostscript.  The "characters" you see on the screen are in Adobe PS Type3 font and this font does not map to unicode so if you cut and paste it you just end up with random characters as you need a Postscript engine to interpret them.

Your quickest conversion would be to use a free OCR package and cut and paste a screenshot to this section as a graphic for conversion to text.

What you see isn't always what you get!
0
 
nobusCommented:
you can also try this online concerter :  http://convertonlinefree.com/PDFToTXTEN.aspx
0
 
LLaurent-59Author Commented:
Thanks MASQ,

that's what I imagined, but do you know any free OCR package

Thanks in advance
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LLaurent-59Author Commented:
Thanks NOBUS

but when I try this online converter : "Text cannot be extracted..."
0
 
Nice-GhazaCommented:
Dear Sir , kindly use the This one software  .its solve your issue  [  nitro pro 8 ] Thanks
0
 
LLaurent-59Author Commented:
Thanks again NOBUS

but again, this give me an incredible result,
if you can read it or traduce it !!! amazing !!!
Acronyms-zamzar.txt
0
 
LLaurent-59Author Commented:
Thanks NICE-GHAZA

Nitro Pro is a very sophisticated pdf tool, but expensive too,
I dont want to upload such as big application

I would like to find an easy ocr tool ... as suggested by MASQ
I found a solution using PDFCreator, and next Foxit Phantom PDF which have an ocr tool
0
 
Nice-GhazaCommented:
Dear Sir, use any one small  tool
(1)  AbbyyFineReader8

(2) Able2Extract.Professional.v6.0.0.0-NoPE



Thanks
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
PDF-XChange Editor comes in Free and Pro versions:
http://www.tracker-software.com/product/pdf-xchange-editor

When you install it, select Free. Even the Free version has OCR and it handles your document perfectly. I just OCRed it with Accuracy set to High, and Output Type set to Create New Searchable PDF:

PDF-XChange Editor Free OCR
Attached is the PDF that it created. The text copy/pastes perfectly! Regards, Joe
Acronyms-via-PDF-XChange-Editor-OCR.pdf
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
It just occurred to me that your doc has French, so I OCRed it again and selected French as the language:

OCR Language choice
Also, I should have mentioned that the OCR tool is on the Document menu:

OCR menu
Attached is the PDF that it created with French as the OCR language choice. Once again, the text copy/pastes perfectly!

Also, be sure to pick the Free Version when you run the installer:

PDF-XChange Editor - Free or Pro
Regards, Joe
Acronyms-OCR-French.pdf
0
 
LLaurent-59Author Commented:
Thanks JOE

I am glad to see it works with PDF-XChange

here is a part of a selection-copy-paste of the beginning of file results :
CNRS
CNUCED
CRS
CSFI
DEA
Centre national de la recherche scientifique
Conference des Nations unies sur le commerce et le developpement
Catholic Relief Services
Centre for the Study of Financial Innovation ...

in fact it would be OK if it was the following
CNRS Centre national de la recherche scientifique
CNUCED Conférence des Nations unies sur le commerce et le développement
CRS Catholic Relief Services
CSFI Centre for the Study of Financial Innovation
DEA ...

I obtained it using Foxit PhantomPDF OCR

and I would appreciate to find an easy ocr toll for doing this

...
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
OK, if the order of the text is important, here's a free way to do it. IrfanView is excellent (free!) imaging software that I've been using for many years:
http://www.irfanview.com/

I recommend the Brothersoft link to download IrfanView:
http://www.brothersoft.com/download-irfanview-6224.html

This will download a single install file called <iview438_setup.exe> with no adware and no junk!

And the Brothersoft link for the PlugIns, which are required for PDF support:
http://www.brothersoft.com/download-irfanview-all-plugins-164981.html

This will download a single install file called <irfanview_plugins_438_setup.exe> with no adware and no junk!

Install IrfanView first, then install the PlugIns. In addition, for OCR capability, there's a separate plug-in for that, also free (download and run it):
http://irfanview.info/plugins/kadmos/setup_kadmos_irfanview_us.exe

I just used IrfanView with the PDF and Kadmos OCR plug-ins on your file (all are free). I also downloaded the French language dictionary for Kadmos, which produced a much better result than the English dictionary. Here's a copy/paste of what it created from your doc:


CNRS             Centre national de la recherche scientifique
CNUCED              Conférence des Nations unies sur le commerce et le développement
CRS                 Catholic Relief Services
CSFI                Centre for the Study ofFinancial Innovation
DEA                   Data Envelopment Analysis
DG EUROPEAID Direction générale de développement et coopération européenne
DID                 Développement international Desjardins
EXCOM                   Executive Committee
FENU            Fonds d'équipement des Nations unies
FERT                        Formation pour l'épanouissement et le ~uveau de la terre
FFP                Fondo Financiero Privado
FIDA             Fonds international de développement agricole
FlNCA                          Foundation for International Community Assistance
FOROLACFR Foro Latinoamericano y del Caribe de Finanzas Rurales
FMO                 Banque néerlandaise de développement
FSLN             Front sandiniste de libération nationale
G20                Group ofthe 20 major economies
GRET                      Groupe de recherche et d'échange technologiques
GRI                Global Reporting Initiative
GTZ                          Gesellschaft fûr Technische Zusammenarbeit
IDS               Institute ofDevelopment Studies
IFI                  Institutions financières internationales
IMF                  Institution de Microfinance


Regards, Joe
0
 
☠ MASQ ☠Commented:
For a one off OCR job I'd make a screenshot image and use the free OCR built into Google docs (it's the engine they use for their document scanning projects)
https://support.google.com/drive/answer/176692?hl=en

As you've discovered it's pointless simply trying pdf to text tools as they can't translate the Type3 fonts into Unicode you'll just get varying shades of gibberish!
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Hi LLaurent,
It's been 10 days since I documented a solution for you (using free tools only) in this post <http:#a40450183> and I'm wondering if you've had a chance to try it. It worked perfectly here (as the copy/paste at my last post shows), but if you're having any problems getting it to work, please let me know and I'll try to help you through it. Regards, Joe
0
 
LeeTutorretiredCommented:
I've requested that this question be deleted for the following reason:

The question has either no comments or not enough useful information to be called an "answer".
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
There is definitely enough useful information to be called an "answer". My solution presented in <http:#a40450183> works perfectly. As the asker requested, it uses all free tools — the free IrfanView, the free PlugIns for PDF support, and the free plug-in for OCR, including its free French language dictionary. I ran the proposed solution on the asker's actual file. It worked perfectly — I even did a copy/paste of the results from the asker's actual file into my post. I'd like to know if anyone can explain how this doesn't exactly and precisely answer the question. Thanks, Joe
0
 
☠ MASQ ☠Commented:
Agreed

There were two parts to the question - "I want to convert this pdf to text" & "I dont understand why I get no correct characters in text mode"

Looks like both had been completely covered

In fact by #40449715 the asker seems to have found an OCR solution themselves.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
MASQ makes an excellent point about <http:#a40449715>, where the asker seems to say that Nitro Pro's PDF Creator capability answers the question (but is too expensive and too big/sophisticated) and so does Foxit PhantomPDF (also an expensive, big, sophisticated product). That's when the asker (in <http:#a40449250>) made it clear that he wants a free product:
do you know any free OCR package
Regards, Joe
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Hi eenookami,

I recommend that it be closed by #2, with these specific comment IDs:

(1) http:#a40449113

MASQ had it right when he suggested OCR.

(2) http:#a40450183

I showed a specific solution with all free tools that worked perfectly on the asker's actual document.

Thanks, Joe
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 7
  • 5
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now