Solved

Garbled text when copy/paste from a PDF that was generated via Internet Explorer 9 => Print to pdf

Posted on 2013-01-15
7
4,704 Views
Last Modified: 2015-06-03
Info on our setup(s) used:

Windows 7 Enterprise  SP1
IE9                   (Ver: 9.0.8112.16421)
CutePDF                  (Ver: 3.0)
GPL Ghostscript (Ver: 8.15)
Adobe reader     (Ver: 9 - XI )


Hello all,

I've been breaking my head on a print from IE9 to PDF issue.
Or better, the print to PDF part works fine, but as soon as you would like to copy/paste content from the generated PDF, the pasted text shows as a bunch of weird characters.
Such as:

%$!$  0  $ '

If you for example paste some content from this PDF (see *.pdf attachment) in Word, you can see the Fonts being used are for example something like : TT33Bt00  (see *.docx attachment)

I believe the answer below to be a good explanation on why this occurs:
(Found here: http://forums.adobe.com/thread/427945)

"It turns out that no usable encoding information is present (neither in the PDF nor in the embedded font data) to derive the meaning of the characters/glyphs that are displayed on the pages in the document.
 
The fonts actually are all embedded, but in a way that all encoding information has been removed. This is a typical example of a PDF that is syntactically fully compliant with the PDF spec but where important information about the meaning of the text in it has been thrown away during the process of making the PDF. As far as I can tell it would be very difficult to recover the encoding info. Strange as it may sound the best option may be to convert the pages to oixel and then run OCR on them...."


A possible solution for Adobe PDF printer users could be to uncheck the option 'Rely on system fonts only, do not use document fonts.' As discussed here :
http://answers.microsoft.com/en-us/ie/forum/ie9-windows_other/ie9-printing-problems-text-is-garbled-when-trying/45457b91-5472-4cf2-951d-79553fff072b
And here:
http://helpx.adobe.com/acrobat/kb/missing-or-garbled-text-printing.html

I think a similar option is provided by CutePDF: When clicking printing preferences - advanced, the item TrueType Font has 2 options: 'Substitute with device font' or 'download as softfont'. It doesn't however offer you the wanted result.

Everything works just fine when using Chrome, Firefox, or IE8 , so I think one may conclude it might just be purely IE9 related. Any hints, clues, things I forgot to test, ..  all welcome.

Thanks in advance.
Robert
Garbled-text.docx
Garbledtext.pdf
0
Comment
Question by:BankDelen
7 Comments
 
LVL 52

Accepted Solution

by:
Joe Winograd, EE MVE earned 375 total points
ID: 38779448
Hi Robert,
We had an extensive thread on a similar issue last month. I don't know if it will help you, but there are numerous ideas in it that are worth a read:
http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_27960233.html

Regards, Joe
0
 
LVL 3

Assisted Solution

by:IKtech
IKtech earned 25 total points
ID: 38779924
what about turning on compatability mode in ie9 for the website?  have you tried that?
0
 
LVL 16

Assisted Solution

by:DansDadUK
DansDadUK earned 100 total points
ID: 38782143
I don't have an answer (I use Windows 8 (not 7), IE10 and Chrome (not IE9), and don't have any 'print to PDF' capability on those browsers).

Just a few comments:

... fonts actually are all embedded, but in a way that all encoding information has been removed ...
This is referred to as font obfuscation.
When printing documents (to real printers), where it is known that the target printer does not have printer-resident equivalents of the fonts used in the document, one choice in the printer driver is to download equivalents of the document fonts as printer-format soft fonts.
With large fonts, the source document may only use a small number of the characters in the font, so it makes sense to only download a subset of the source font to the printer.
Most printer drivers, when subsetting such soft fonts, will obfuscate them, by using a dynamically generated character encoding which doesn't keep any concept of ASCII (or Unicode) character encodings, but only makes sense in the context of the obfuscated soft font subset.
The main reason for this obfuscation is to protect the property rights of the font designer/vendor, to prevent it being easily copied to different formats, especially where the licence restrictions in the font (which is a form of software) allows limited manipulation.


I would guess that:
Something similar is occurring when 'print to PDF' is chosen instead of printing to a real printer.
The problem only occurs when font embedding is selected, and font subsetting is also chosen.
It may be the case that the problem will not occur if font subsetting is not selected - but (not having the same environment) I don't know if this is a valid selection - and, of course, this could considerably increase the size of the generated print stream or PDF.
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 

Author Comment

by:BankDelen
ID: 38782756
@ Joe

Thanks for pointing me to that thread Joe, at this point I'm checking out your suggestion to use doPDF. Still have to go over any known issues, incompatibilities or possible security flaws, but if this turns out good, I guess we'll switch to doPDF since it has gotten the job done right out of the box.


@ IKtech

Thank you for your suggestion IKtech but I indeed had already tried it.


@ DansDadUK

Thank you for your time invested in order to get me up to speed on what's really going on behind the scene. As you already pointed out, not having the same environment makes me lack a Font embedding or font subsetting option unfortunately.


I'll be doing a background check on doPDF, in the meanwhile any suggestions naturally remain welcome.
Thanks so far everyone, I'll try to get back to you no later than tomorrow.

Kind regards,
Robert
0
 

Author Closing Comment

by:BankDelen
ID: 38793493
I would've loved to see a solution that allowed us to just change some parameters in IE9 and keep our setup unchanged, but I'll guess we'll just 'doPDF' !

Thank you for your help, time and suggestions guys.
Cheers, Robert
0
 

Expert Comment

by:Vincent Carrier
ID: 40810754
We are deploying Internet Explorer 11 in my company, and we experienced the same issue which can be reproduced easily. I resolved the issue by updating Ghostscript to the latest version.

All our computers are equipped with CutePDF Writer 3.0;
Users go on a Web site and print as PDF using CutePDF virtual printer;
Users then open the PDF file in Adobe Reader (we have version XI);
They select some text in the PDF and copy it in the clipboard (Ctrl+C);
They paste the copied text anywhere (Word, Notepad, etc.)

Before upgrading from IE8 to IE11, it worked fine. The formatting was not perfect but at least the text was there.
Once that they use IE11, the paste results in garbage characters.

I found out that CutePDF relies on Ghostscript to produce the PDF. When we install CutePDF, Ghostscript 8.15 is installed along. We can see it in the PDF file properties in Adobe Reader that the PDF Producer is "GPL Ghostscript 8.15". So I went to the Ghostscript web site and install the newest Ghostscript package, version 9.16. As soon as I did it, CutePDF started producing PDF files with this newer version of Ghostscript, and the text becomes copiable.

You can download Ghostscript from there: http://ghostscript.com/download/gsdnld.html

Even on 64-bit systems, it's the 32-bit version of Ghostscript that must be installed.

Hope it helps.


V.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 40810784
Yes, "stale" copies of Ghostscript can cause grief with CutePDF. Here's an EE post from a year ago that discusses it:
http://www.experts-exchange.com/Software/Office_Productivity/Q_28433399.html#a40065809

Many PDF print drivers rely on Ghostscript, including Bullzip and CutePDF, two of the best. But that's one reason I like doPDF — it does not use Ghostscript. Regards, Joe
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As with any other System Center product, the installation for the Authoring Tool can be quite a pain sometimes. This article serves to help you avoid making these mistakes and hopefully save you a ton of time on troubleshooting :)  Step 1: Make sur…
User Beware!  This is a rather permanent solution to removing your email from an exchange server.  The only way to truly go back is to have your exchange administrator restore your mailbox from backups.  This is usually the option of last resort.  A…
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
The viewer will learn how to successfully download and install the SARDU utility on Windows 7, without downloading adware.

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now