Garbled text in PDF using Copy, Find to Search

Hello,
I have an Access 2003 application (using an Access 2003 front end with Access 2000 back end data) from which I generate reports. The reports are previewed in Access then generated as PDF docs (some of my users have pdfFactory Pro while others simply have Adobe). In either case it seems that the PDF document has something odd internally because while I expect to be able to highlight some text, copy it, open the find dialog and paste that text to use it to search it pastes in somewhat garbled text (definitely NOT what you see in the document itself nor in the highlighted text that was used in the copy).

For example, the text for the following text in the doc:
1,3 BUTYLENE GLYCOL
the text when pasted into the Find dialog is:
1,3 BSUuTbsYtaLnEceN:E GLYCOL

Note: I had to literally type the example text in above but I also literally pasted from the doc to show this example.

These PDF docs are of little to no value if they cannot be searched. Does anyone have any idea what is going on or how to compensate for this?

I have tried to ensure that any used fonts are embedded and that only one font is used in any given report. When I check the properties of the document I find the following

Arial
Type: True Type
Encodeing Ansi
Actual Font: ArialMT
Actual Font Type: TrueType

TimesNewRoman (Embedded Subset)
Type: TrueType
Encoding: Ansi

TimesNewRoman, Bold (Embedded Subset)
Type: TrueType
Encoding: Ansi


Even in situations where there is no reference to the Arial font (at least not that I have been able to find).

I have also noticed that if I double-click on a word that I would like to do a search on, I may or may not get only that word highlighted. Again, by way of example, if I double-click on the word "BUTYLENE" in the doc, only BUTYLEN is actually highlighted, If I then do a copy and paste I get: BSUuTbsYtaLnEceN

Any help in resolving this would be greatly appreciated!
TIA,
James
JamesDFAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

captainCommented:
Hi

OK, I have no idea how this happens at the moment but if you unscramble your paste, the folllowing is notable.

If you remove the letters in the pasted word BSUuTbsYtaLnEceN:E GLYCOL to make only BUTYLENE GLYCOL you get:

Substance:
BUTYLENE GLYCOL

So I assume the PDF when created mashes up the word "Substance:" with "BUTYLENE GLYCOL" and creates the string you copy.

Is there any way you could post both, this page in the source file format and the PDF? I am certain that the clue to the answer is in the source format.

capt.
0
Jeffrey CoachmanMIS LiasonCommented:
I have seen this issue as well.
1. In early PDF's you could not even copy/paste or "search", ...so make sure you have the latest versions of both of your PDF software packages.
2. I have had odd results with copy/paste with PDF's too.
It can be tricky getting a PDF to "act" like a word doc (Copy/Paste) , ...unless you have the full version of Adobe.  ...I too get odd fonts when pasted to word...,and have trouble highliting

<These PDF docs are of little to no value if they cannot be searched. Does anyone have any idea what is going on or how to compensate for this?>

Again PDF was never touted as being 100% compatible with copy/paste/highlight, to Word (or any other format for that matter)
And with Adobe and MS being competitors on many fronts, ...this may never be "Perfect"

Also note that not every "formatting" aspect can be copied and pasted faithfully, so make sure your report is a "Basic/Simple" as possible (regarding formatting)

Some alternatives:
1. Why search the PDF anyway?, ...why not just search the data as it is in the database?
2. Create two outputs, one to .rtf (Word) and the other to .PDF
3. But the full version of Adobe and you can save the PDF as a Word Doc.
4. Buy a PDF-->Word converter


JeffCoachman
0
Jim P.Commented:
I ran into something similar way back. There were multiple issues

One issue was how close the title box was to the text box.
One was which PDF "printer" was being used. Our company was using three different PDF software and one would give us crap output. We switched to PDFCreator and that solved most of our issues.
0
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

JamesDFAuthor Commented:
Thanks to all for getting back to me. I am not in a position at the moment to test or act on any of these suggestions - and  I will not be until next week. These responses definitely give me a couple of things to try when I return.

A quick comment: although it seems reasonable to simply search the database it is not easily accessible by all, In addition, it is a repository for the information which is published when in final form as PDF for others to use as a reference.

Thanks again. I will get back to this next week...
0
JamesDFAuthor Commented:
I am back and have had a chance to dig in to this a little further. Although the suggestions above were thought provoking and got me to poking around it turned out that problem was being created within the Access report itself. Captainreiss noticed the same thing I did in that it appeared that the garbled text contained the text 'Substance' embedded within itself. The field involved, not surprisingly, was [Substance]. I attempted to simply 'move and resize' things within the report to see if I could determine what was happening. It was then that I discovered that the textbox that was being used to display the field [Substance] had an associated label (containing the default text 'Substance') but it was behind the actual textbox and so was not visible in design mode (until the textbox was moved or resized) and was not visible in the 'printed' (PDF) document because it was in the background. The generated PDF document did indeed have that text embedded in it and that was what was causing the garbled text in the copy / paste operation. Once the unnecessary associated label was deleted the copy / paste in Find of the PDF worked exactly as it should have.
I appreciate folks having a look and offering suggestions but none actual resolved the problem. Not wanting to slight anyone's efforts I have no idea how to assign any points to this. It seems that technically none are earned. I am open to reasonable arguments to the contrary... James
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Jeffrey CoachmanMIS LiasonCommented:
Then as far as I can tell you solved this issue yourself with  Captainreiss 's help.

So you can Accept your own post above as the solution, and assign some points to  Captainreiss  for assisting...
0
JamesDFAuthor Commented:
Apparently I have to provide a reason for accepting my own solution: it is the solution!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.