Solved

Garbled text when copy/paste from a PDF that was generated via Internet Explorer 9 => Print to pdf

Posted on 2013-01-15
7
4,621 Views
Last Modified: 2015-06-03
Info on our setup(s) used:

Windows 7 Enterprise  SP1
IE9                   (Ver: 9.0.8112.16421)
CutePDF                  (Ver: 3.0)
GPL Ghostscript (Ver: 8.15)
Adobe reader     (Ver: 9 - XI )


Hello all,

I've been breaking my head on a print from IE9 to PDF issue.
Or better, the print to PDF part works fine, but as soon as you would like to copy/paste content from the generated PDF, the pasted text shows as a bunch of weird characters.
Such as:

%$!$  0  $ '

If you for example paste some content from this PDF (see *.pdf attachment) in Word, you can see the Fonts being used are for example something like : TT33Bt00  (see *.docx attachment)

I believe the answer below to be a good explanation on why this occurs:
(Found here: http://forums.adobe.com/thread/427945)

"It turns out that no usable encoding information is present (neither in the PDF nor in the embedded font data) to derive the meaning of the characters/glyphs that are displayed on the pages in the document.
 
The fonts actually are all embedded, but in a way that all encoding information has been removed. This is a typical example of a PDF that is syntactically fully compliant with the PDF spec but where important information about the meaning of the text in it has been thrown away during the process of making the PDF. As far as I can tell it would be very difficult to recover the encoding info. Strange as it may sound the best option may be to convert the pages to oixel and then run OCR on them...."


A possible solution for Adobe PDF printer users could be to uncheck the option 'Rely on system fonts only, do not use document fonts.' As discussed here :
http://answers.microsoft.com/en-us/ie/forum/ie9-windows_other/ie9-printing-problems-text-is-garbled-when-trying/45457b91-5472-4cf2-951d-79553fff072b
And here:
http://helpx.adobe.com/acrobat/kb/missing-or-garbled-text-printing.html

I think a similar option is provided by CutePDF: When clicking printing preferences - advanced, the item TrueType Font has 2 options: 'Substitute with device font' or 'download as softfont'. It doesn't however offer you the wanted result.

Everything works just fine when using Chrome, Firefox, or IE8 , so I think one may conclude it might just be purely IE9 related. Any hints, clues, things I forgot to test, ..  all welcome.

Thanks in advance.
Robert
Garbled-text.docx
Garbledtext.pdf
0
Comment
Question by:BankDelen
7 Comments
 
LVL 51

Accepted Solution

by:
Joe Winograd, EE MVE earned 375 total points
Comment Utility
Hi Robert,
We had an extensive thread on a similar issue last month. I don't know if it will help you, but there are numerous ideas in it that are worth a read:
http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_27960233.html

Regards, Joe
0
 
LVL 3

Assisted Solution

by:IKtech
IKtech earned 25 total points
Comment Utility
what about turning on compatability mode in ie9 for the website?  have you tried that?
0
 
LVL 16

Assisted Solution

by:DansDadUK
DansDadUK earned 100 total points
Comment Utility
I don't have an answer (I use Windows 8 (not 7), IE10 and Chrome (not IE9), and don't have any 'print to PDF' capability on those browsers).

Just a few comments:

... fonts actually are all embedded, but in a way that all encoding information has been removed ...
This is referred to as font obfuscation.
When printing documents (to real printers), where it is known that the target printer does not have printer-resident equivalents of the fonts used in the document, one choice in the printer driver is to download equivalents of the document fonts as printer-format soft fonts.
With large fonts, the source document may only use a small number of the characters in the font, so it makes sense to only download a subset of the source font to the printer.
Most printer drivers, when subsetting such soft fonts, will obfuscate them, by using a dynamically generated character encoding which doesn't keep any concept of ASCII (or Unicode) character encodings, but only makes sense in the context of the obfuscated soft font subset.
The main reason for this obfuscation is to protect the property rights of the font designer/vendor, to prevent it being easily copied to different formats, especially where the licence restrictions in the font (which is a form of software) allows limited manipulation.


I would guess that:
Something similar is occurring when 'print to PDF' is chosen instead of printing to a real printer.
The problem only occurs when font embedding is selected, and font subsetting is also chosen.
It may be the case that the problem will not occur if font subsetting is not selected - but (not having the same environment) I don't know if this is a valid selection - and, of course, this could considerably increase the size of the generated print stream or PDF.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:BankDelen
Comment Utility
@ Joe

Thanks for pointing me to that thread Joe, at this point I'm checking out your suggestion to use doPDF. Still have to go over any known issues, incompatibilities or possible security flaws, but if this turns out good, I guess we'll switch to doPDF since it has gotten the job done right out of the box.


@ IKtech

Thank you for your suggestion IKtech but I indeed had already tried it.


@ DansDadUK

Thank you for your time invested in order to get me up to speed on what's really going on behind the scene. As you already pointed out, not having the same environment makes me lack a Font embedding or font subsetting option unfortunately.


I'll be doing a background check on doPDF, in the meanwhile any suggestions naturally remain welcome.
Thanks so far everyone, I'll try to get back to you no later than tomorrow.

Kind regards,
Robert
0
 

Author Closing Comment

by:BankDelen
Comment Utility
I would've loved to see a solution that allowed us to just change some parameters in IE9 and keep our setup unchanged, but I'll guess we'll just 'doPDF' !

Thank you for your help, time and suggestions guys.
Cheers, Robert
0
 

Expert Comment

by:Vincent Carrier
Comment Utility
We are deploying Internet Explorer 11 in my company, and we experienced the same issue which can be reproduced easily. I resolved the issue by updating Ghostscript to the latest version.

All our computers are equipped with CutePDF Writer 3.0;
Users go on a Web site and print as PDF using CutePDF virtual printer;
Users then open the PDF file in Adobe Reader (we have version XI);
They select some text in the PDF and copy it in the clipboard (Ctrl+C);
They paste the copied text anywhere (Word, Notepad, etc.)

Before upgrading from IE8 to IE11, it worked fine. The formatting was not perfect but at least the text was there.
Once that they use IE11, the paste results in garbage characters.

I found out that CutePDF relies on Ghostscript to produce the PDF. When we install CutePDF, Ghostscript 8.15 is installed along. We can see it in the PDF file properties in Adobe Reader that the PDF Producer is "GPL Ghostscript 8.15". So I went to the Ghostscript web site and install the newest Ghostscript package, version 9.16. As soon as I did it, CutePDF started producing PDF files with this newer version of Ghostscript, and the text becomes copiable.

You can download Ghostscript from there: http://ghostscript.com/download/gsdnld.html

Even on 64-bit systems, it's the 32-bit version of Ghostscript that must be installed.

Hope it helps.


V.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
Yes, "stale" copies of Ghostscript can cause grief with CutePDF. Here's an EE post from a year ago that discusses it:
http://www.experts-exchange.com/Software/Office_Productivity/Q_28433399.html#a40065809

Many PDF print drivers rely on Ghostscript, including Bullzip and CutePDF, two of the best. But that's one reason I like doPDF — it does not use Ghostscript. Regards, Joe
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Many companies are making the switch from Microsoft to Google Apps (https://www.google.com/work/apps/business/). Use this article to learn more about what Google Apps has to offer and to help if you’re planning on migrating to Google Apps. It is …
The System Center Operations Manager 2012, known as SCOM, is a part of the Microsoft system center product that provides the user with infrastructure monitoring and application performance monitoring. SCOM monitors:   Windows or UNIX/LinuxNetwo…
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now