Solved

Can anyone scan/convert this document into Word?

Posted on 2014-04-29
17
453 Views
Last Modified: 2014-05-05
I've been using http://www.abbyyonline.com/  to convert PDF files for a course into Word to make them searchable - the PDF files are not searchable.

I've never had any problems until the document attached which just doesn't seem to want to convert.

If someone could convert the PDF document into a searchable Word document I'd appreciate it, as it makes looking things up much easier obviously.
0
Comment
Question by:purplesoup
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 3
17 Comments
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40031093
Graham,
I think that conversion to a searchable form for private use of the individual with no intent to distribute or transmit is permitted under Fair Use. I'm not an intellectual property lawyer, but since the University has provided the document as a PDF file to students, it seems within Fair Use to make it a searchable PDF file or a searchable Word file. If you agree, I will post a searchable PDF file and a searchable Word file, which I have already created, and was about to post when I saw your comment — I certainly do not want to be in violation of copyright law. So please let me know me if you think this comes under the Fair Use provision.

purplesoup,
While we're waiting to hear back from Graham (who is an EE Admin), you should contact the copyright holder (The Open University) and request written permission to create a searchable PDF and/or searchable Word file. If you get written permission, that will remove the need to make an interpretation based on the Fair Use provision.

Regards, Joe
0
 

Author Comment

by:purplesoup
ID: 40031161
Well there have been a number of forum posts on the topic in which students - on the OU website - share converted PDFs, however there aren't any of this particular file. Tutors participate in the forums and have apologised that these documents are so old they aren't searchable and have thanked students who share searchable versions of the documents - which seems a sensible attitude.

I can post copies of the forum posts and links to already converted files, but I expect the "admin" person wouldn't like that.

Alternatively perhaps Joe you could email the files to bilbo039-eeATyahoo.co.uk - a temporary email address I've just set up?
0
 

Author Comment

by:purplesoup
ID: 40031162
Incidentally Graham could you point out exactly which policy you are referring to?
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40031181
In case Graham is gone for the evening, I'll jump in with an answer to your question. He's talking about the Experts Exchange Terms of Use:
http://www.experts-exchange.com/terms.jsp

Take a look at Article 6, Code of Conduct:
You shall not engage in any of the following activities, which are strictly prohibited under the Code of Conduct.
and then
11. Posting any content that infringes any third party’s intellectual property rights or violates any confidentiality Agreements, contracts of employment, licenses, “Terms of Use”, or copyright.
Regards, Joe
0
 
LVL 76

Accepted Solution

by:
GrahamSkan earned 500 total points
ID: 40031397
The reason that PDF converters would have difficulty with the document is that it comprises an image. The text is not digitised.

This means that it would require a fully functional OCR program to interpret it. I haven't used ABBYY, but, from my experience with Nuance's Omnipage, I suspect that it would be a lengthy process.

The image quality is good, but there are many mathematical formulae with special symbols that would require manual intervention.
0
 

Author Comment

by:purplesoup
ID: 40032138
I've requested that this question be deleted for the following reason:

I believe this question may have infringed the EE third-party copyright policy so please delete it.
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40032176
Graham,
You're welcome...happy to help...and always want to comply with copyright law and, more generally, EE's Terms of Use.

On the technical front, I tried it with three Nuance products, which did well overall, but, of course, were not perfect. I used the latest version of PaperPort Professional (14.5), which has the OmniPage OCR engine under the covers (OP18 capture SDK), to create a searchable PDF. I then tried the latest version of OmniPage Professional, which is really OP19 in the numbering sequence (but Nuance decided to call it OmniPage Ultimate, for whatever reason), to OCR it and create a searchable DOCX file. I then used Nuance's latest-and-greatest piece of software, Power PDF Advanced — also to OCR it and create a searchable DOCX file.

You're right about the formulas. For example, this formula:

formulawas OCRed as follows:

PP14
H = {e,cio} = Cl UooC1

OP19
H = {e, cro} = CiU croCi

PPA
H = {e,cro} = Ci U croCi

I told all three products to ignore errors, so the process was fast, but I'm sure you're right that it would take a long time to intervene in order to get all of the formulas correct. I also have ABBYY FineReader (and some other OCR products), but figured those three tests were enough.

I presume the screen capture of that one formula falls under Fair Use, but if you disagree, feel free to remove it. Regards, Joe
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40032797
Joe,
Thank for your additional information and support

purplesoup,
Thank you for accepting the situation. I wish you luck with your studies.
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40032818
Graham,
You're welcome...happy to help. Regards, Joe
0
 

Author Comment

by:purplesoup
ID: 40032917
Graham if it was possible perhaps it would be fairer if Joe got the points?

Note on formulas - I've been using the conversions just to search for keywords, I don't trust the formulas and go back to the original PDF if I need to check.
0
 

Author Comment

by:purplesoup
ID: 40032919
Also Graham if you just wanted to remove the attached file and leave this discussion that would be fine too.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40041487
Thank you, Netminder.  I'll remember that in future,
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40043738
Netminder,
Thanks for doing that.

purplesoup,
Don't worry about reassigning the points. I'm fine with Graham getting them.

Regards, Joe
0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article focuses on how to remove password security from multiple PDF files by Adobe Acrobat program. Sometimes it is essential to access the stored data items and to print, edit as well as copy content from Portable Document Format files in abs…
This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
Learn how to make your own table of contents in Microsoft Word using paragraph styles and the automatic table of contents tool. We'll be using the paragraph styles in Word’s Home toolbar to help you create a table of contents. Type out your initial …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

635 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question