Solved

Can anyone scan/convert this document into Word?

Posted on 2014-04-29
17
439 Views
Last Modified: 2014-05-05
I've been using http://www.abbyyonline.com/  to convert PDF files for a course into Word to make them searchable - the PDF files are not searchable.

I've never had any problems until the document attached which just doesn't seem to want to convert.

If someone could convert the PDF document into a searchable Word document I'd appreciate it, as it makes looking things up much easier obviously.
0
Comment
Question by:purplesoup
  • 5
  • 5
  • 3
17 Comments
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40031093
Graham,
I think that conversion to a searchable form for private use of the individual with no intent to distribute or transmit is permitted under Fair Use. I'm not an intellectual property lawyer, but since the University has provided the document as a PDF file to students, it seems within Fair Use to make it a searchable PDF file or a searchable Word file. If you agree, I will post a searchable PDF file and a searchable Word file, which I have already created, and was about to post when I saw your comment — I certainly do not want to be in violation of copyright law. So please let me know me if you think this comes under the Fair Use provision.

purplesoup,
While we're waiting to hear back from Graham (who is an EE Admin), you should contact the copyright holder (The Open University) and request written permission to create a searchable PDF and/or searchable Word file. If you get written permission, that will remove the need to make an interpretation based on the Fair Use provision.

Regards, Joe
0
 

Author Comment

by:purplesoup
ID: 40031161
Well there have been a number of forum posts on the topic in which students - on the OU website - share converted PDFs, however there aren't any of this particular file. Tutors participate in the forums and have apologised that these documents are so old they aren't searchable and have thanked students who share searchable versions of the documents - which seems a sensible attitude.

I can post copies of the forum posts and links to already converted files, but I expect the "admin" person wouldn't like that.

Alternatively perhaps Joe you could email the files to bilbo039-eeATyahoo.co.uk - a temporary email address I've just set up?
0
 

Author Comment

by:purplesoup
ID: 40031162
Incidentally Graham could you point out exactly which policy you are referring to?
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40031181
In case Graham is gone for the evening, I'll jump in with an answer to your question. He's talking about the Experts Exchange Terms of Use:
http://www.experts-exchange.com/terms.jsp

Take a look at Article 6, Code of Conduct:
You shall not engage in any of the following activities, which are strictly prohibited under the Code of Conduct.
and then
11. Posting any content that infringes any third party’s intellectual property rights or violates any confidentiality Agreements, contracts of employment, licenses, “Terms of Use”, or copyright.
Regards, Joe
0
 
LVL 76

Accepted Solution

by:
GrahamSkan earned 500 total points
ID: 40031397
The reason that PDF converters would have difficulty with the document is that it comprises an image. The text is not digitised.

This means that it would require a fully functional OCR program to interpret it. I haven't used ABBYY, but, from my experience with Nuance's Omnipage, I suspect that it would be a lengthy process.

The image quality is good, but there are many mathematical formulae with special symbols that would require manual intervention.
0
 

Author Comment

by:purplesoup
ID: 40032138
I've requested that this question be deleted for the following reason:

I believe this question may have infringed the EE third-party copyright policy so please delete it.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40032176
Graham,
You're welcome...happy to help...and always want to comply with copyright law and, more generally, EE's Terms of Use.

On the technical front, I tried it with three Nuance products, which did well overall, but, of course, were not perfect. I used the latest version of PaperPort Professional (14.5), which has the OmniPage OCR engine under the covers (OP18 capture SDK), to create a searchable PDF. I then tried the latest version of OmniPage Professional, which is really OP19 in the numbering sequence (but Nuance decided to call it OmniPage Ultimate, for whatever reason), to OCR it and create a searchable DOCX file. I then used Nuance's latest-and-greatest piece of software, Power PDF Advanced — also to OCR it and create a searchable DOCX file.

You're right about the formulas. For example, this formula:

formulawas OCRed as follows:

PP14
H = {e,cio} = Cl UooC1

OP19
H = {e, cro} = CiU croCi

PPA
H = {e,cro} = Ci U croCi

I told all three products to ignore errors, so the process was fast, but I'm sure you're right that it would take a long time to intervene in order to get all of the formulas correct. I also have ABBYY FineReader (and some other OCR products), but figured those three tests were enough.

I presume the screen capture of that one formula falls under Fair Use, but if you disagree, feel free to remove it. Regards, Joe
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40032797
Joe,
Thank for your additional information and support

purplesoup,
Thank you for accepting the situation. I wish you luck with your studies.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40032818
Graham,
You're welcome...happy to help. Regards, Joe
0
 

Author Comment

by:purplesoup
ID: 40032917
Graham if it was possible perhaps it would be fairer if Joe got the points?

Note on formulas - I've been using the conversions just to search for keywords, I don't trust the formulas and go back to the original PDF if I need to check.
0
 

Author Comment

by:purplesoup
ID: 40032919
Also Graham if you just wanted to remove the attached file and leave this discussion that would be fine too.
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 40041487
Thank you, Netminder.  I'll remember that in future,
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40043738
Netminder,
Thanks for doing that.

purplesoup,
Don't worry about reassigning the points. I'm fine with Graham getting them.

Regards, Joe
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Preface: When I started this series, I used the term CommandBars because that is the Office Object class that it discusses. Unfortunately, when Microsoft introduced Office 2007, they replaced the standard Commandbar menus with "The Ribbon" and rem…
This article describes how to use the Send to Mail Recipient command. The instructions apply generally to Office 2007 and later versions, but Microsoft® Word 2013 was used for the specific steps and figures.  What is Send to Mail Recipient? Send…
This Experts Exchange video Micro Tutorial shows how to tell Microsoft Office that a word is NOT spelled correctly. Microsoft Office has a built-in, main dictionary that is shared by Office apps, including Excel, Outlook, PowerPoint, and Word. When …
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now