• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5712
  • Last Modified:

Garbled text when copy/paste from a PDF that was generated via Internet Explorer 9 => Print to pdf

Info on our setup(s) used:

Windows 7 Enterprise  SP1
IE9                   (Ver: 9.0.8112.16421)
CutePDF                  (Ver: 3.0)
GPL Ghostscript (Ver: 8.15)
Adobe reader     (Ver: 9 - XI )


Hello all,

I've been breaking my head on a print from IE9 to PDF issue.
Or better, the print to PDF part works fine, but as soon as you would like to copy/paste content from the generated PDF, the pasted text shows as a bunch of weird characters.
Such as:

%$!$  0  $ '

If you for example paste some content from this PDF (see *.pdf attachment) in Word, you can see the Fonts being used are for example something like : TT33Bt00  (see *.docx attachment)

I believe the answer below to be a good explanation on why this occurs:
(Found here: http://forums.adobe.com/thread/427945)

"It turns out that no usable encoding information is present (neither in the PDF nor in the embedded font data) to derive the meaning of the characters/glyphs that are displayed on the pages in the document.
 
The fonts actually are all embedded, but in a way that all encoding information has been removed. This is a typical example of a PDF that is syntactically fully compliant with the PDF spec but where important information about the meaning of the text in it has been thrown away during the process of making the PDF. As far as I can tell it would be very difficult to recover the encoding info. Strange as it may sound the best option may be to convert the pages to oixel and then run OCR on them...."


A possible solution for Adobe PDF printer users could be to uncheck the option 'Rely on system fonts only, do not use document fonts.' As discussed here :
http://answers.microsoft.com/en-us/ie/forum/ie9-windows_other/ie9-printing-problems-text-is-garbled-when-trying/45457b91-5472-4cf2-951d-79553fff072b
And here:
http://helpx.adobe.com/acrobat/kb/missing-or-garbled-text-printing.html

I think a similar option is provided by CutePDF: When clicking printing preferences - advanced, the item TrueType Font has 2 options: 'Substitute with device font' or 'download as softfont'. It doesn't however offer you the wanted result.

Everything works just fine when using Chrome, Firefox, or IE8 , so I think one may conclude it might just be purely IE9 related. Any hints, clues, things I forgot to test, ..  all welcome.

Thanks in advance.
Robert
Garbled-text.docx
Garbledtext.pdf
0
BankDelen
Asked:
BankDelen
3 Solutions
 
Joe Winograd, EE Fellow 2017, MVE 2016, MVE 2015DeveloperCommented:
Hi Robert,
We had an extensive thread on a similar issue last month. I don't know if it will help you, but there are numerous ideas in it that are worth a read:
http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_27960233.html

Regards, Joe
0
 
IKtechCommented:
what about turning on compatability mode in ie9 for the website?  have you tried that?
0
 
DansDadUKCommented:
I don't have an answer (I use Windows 8 (not 7), IE10 and Chrome (not IE9), and don't have any 'print to PDF' capability on those browsers).

Just a few comments:

... fonts actually are all embedded, but in a way that all encoding information has been removed ...
This is referred to as font obfuscation.
When printing documents (to real printers), where it is known that the target printer does not have printer-resident equivalents of the fonts used in the document, one choice in the printer driver is to download equivalents of the document fonts as printer-format soft fonts.
With large fonts, the source document may only use a small number of the characters in the font, so it makes sense to only download a subset of the source font to the printer.
Most printer drivers, when subsetting such soft fonts, will obfuscate them, by using a dynamically generated character encoding which doesn't keep any concept of ASCII (or Unicode) character encodings, but only makes sense in the context of the obfuscated soft font subset.
The main reason for this obfuscation is to protect the property rights of the font designer/vendor, to prevent it being easily copied to different formats, especially where the licence restrictions in the font (which is a form of software) allows limited manipulation.


I would guess that:
Something similar is occurring when 'print to PDF' is chosen instead of printing to a real printer.
The problem only occurs when font embedding is selected, and font subsetting is also chosen.
It may be the case that the problem will not occur if font subsetting is not selected - but (not having the same environment) I don't know if this is a valid selection - and, of course, this could considerably increase the size of the generated print stream or PDF.
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
BankDelenAuthor Commented:
@ Joe

Thanks for pointing me to that thread Joe, at this point I'm checking out your suggestion to use doPDF. Still have to go over any known issues, incompatibilities or possible security flaws, but if this turns out good, I guess we'll switch to doPDF since it has gotten the job done right out of the box.


@ IKtech

Thank you for your suggestion IKtech but I indeed had already tried it.


@ DansDadUK

Thank you for your time invested in order to get me up to speed on what's really going on behind the scene. As you already pointed out, not having the same environment makes me lack a Font embedding or font subsetting option unfortunately.


I'll be doing a background check on doPDF, in the meanwhile any suggestions naturally remain welcome.
Thanks so far everyone, I'll try to get back to you no later than tomorrow.

Kind regards,
Robert
0
 
BankDelenAuthor Commented:
I would've loved to see a solution that allowed us to just change some parameters in IE9 and keep our setup unchanged, but I'll guess we'll just 'doPDF' !

Thank you for your help, time and suggestions guys.
Cheers, Robert
0
 
Vincent CarrierDesktop architectCommented:
We are deploying Internet Explorer 11 in my company, and we experienced the same issue which can be reproduced easily. I resolved the issue by updating Ghostscript to the latest version.

All our computers are equipped with CutePDF Writer 3.0;
Users go on a Web site and print as PDF using CutePDF virtual printer;
Users then open the PDF file in Adobe Reader (we have version XI);
They select some text in the PDF and copy it in the clipboard (Ctrl+C);
They paste the copied text anywhere (Word, Notepad, etc.)

Before upgrading from IE8 to IE11, it worked fine. The formatting was not perfect but at least the text was there.
Once that they use IE11, the paste results in garbage characters.

I found out that CutePDF relies on Ghostscript to produce the PDF. When we install CutePDF, Ghostscript 8.15 is installed along. We can see it in the PDF file properties in Adobe Reader that the PDF Producer is "GPL Ghostscript 8.15". So I went to the Ghostscript web site and install the newest Ghostscript package, version 9.16. As soon as I did it, CutePDF started producing PDF files with this newer version of Ghostscript, and the text becomes copiable.

You can download Ghostscript from there: http://ghostscript.com/download/gsdnld.html

Even on 64-bit systems, it's the 32-bit version of Ghostscript that must be installed.

Hope it helps.


V.
0
 
Joe Winograd, EE Fellow 2017, MVE 2016, MVE 2015DeveloperCommented:
Yes, "stale" copies of Ghostscript can cause grief with CutePDF. Here's an EE post from a year ago that discusses it:
http://www.experts-exchange.com/Software/Office_Productivity/Q_28433399.html#a40065809

Many PDF print drivers rely on Ghostscript, including Bullzip and CutePDF, two of the best. But that's one reason I like doPDF — it does not use Ghostscript. Regards, Joe
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now