Solved

edit text from pdf on microsoft word

Posted on 2014-04-24
19
770 Views
Last Modified: 2014-05-06
I scanned in a hardcover book using neat.com ocr scanner
output is .pdf

Goal is to update (make new edition by adding words) to an old book

how can I convert .pdf to .doc

I want to edit on microsoft word


I can open pdf with adobe acrobat 11.0.06.70


or maybe there is another program that converts pdf to text.

what about paragraph breaks?
0
Comment
Question by:rgb192
  • 6
  • 6
  • 3
  • +2
19 Comments
 
LVL 23

Expert Comment

by:tailoreddigital
ID: 40021640
You need some OCR software.   I last used TopOCR years ago.   It worked ok.    After a bit of research it looks like PDF X-Change Viewer might be the way to go.   I've use this software for other PDF functions but didn't know it could handle OCR until now.    From the threads i've read, it looks like it might work well for you,

http://www.tracker-software.com/product/pdf-xchange-viewer

http://steveshank.com/cgi-bin/article.pl?aid=440
0
 
LVL 5

Expert Comment

by:Billy Roth
ID: 40021648
Depending on how the neat.com creates the pdf, you could use a professional version of acrobat to "edit" the pdf and copy the text out to a doc.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40021685
For PDF to Word, I've had good (not perfect) results with this free online tool:
http://www.pdftoword.com/

If you prefer a local install, I've also had good (also not perfect) results with this free tool:
http://www.boxoft.com/pdf-to-word/

You may get better results with non-free products. I've gotten better (but still not perfect) results with Nuance's PDF Converter Professional (current version is 8):
http://www.nuance.com/products/pdf-converter-professional/index.htm

However, you'll notice that the above link redirects to a new Nuance product called Power PDF Standard. There's also a Power PDF Advanced version, which has a free trial:
http://www.nuance.com/for-business/document-imaging-and-scanning/power-pdf-converter/index.htm#resources

PDF Converter Professional 8 is still available at Amazon for only $40:
http://www.amazon.com/Nuance-Communications-Inc-M109A-G00-8-0-Professional/dp/B0084PK8CS/

The first link in this post is to the (free) Nitro cloud. Nitro is a well-known name in PDF tools and their Nitro Pro (current version is 8) has a PDF to Word feature:
http://www.nitropdf.com/pro/features/convert-export

There's a free trial, but at $120 is not cheap to buy. I've never used it, so can't vouch for its performance. However, it uses the same engine as the online tool, which I have used and is very good, so I would expect the same of Nitro Pro.

One more non-free product (but reasonably priced at $39) is CAD-KAS's PDF to Word:
http://www.cadkas.com/downengpdf9.php

I haven't used this product, but I have used their PDF Editor Objects, which is excellent. Based on the quality of PDF Editor Objects, I think that their PDF to Word is worth a try, and there's a free trial:
http://www.cadkas.com/pdf2word!.exe

I've been on previous threads here at EE where other experts have recommended these three (free) online tools:

http://www.convertpdftoword.org
http://www.pdfonline.com/pdf-to-word-converter
http://www.wondershare.net/pdf-converter/pdf-to-word-converter.html

I can't personally vouch for these, but based on the positive comments from other members, I'm passing them along for your consideration.

No matter which way you go, keep in mind that PDF-to-Word conversion is tricky business – maintaining the formatting/layout is tough stuff! I haven't found anything that is perfect, and results vary from one document to the next. So my suggestion is to put some, or all, of these products on your short list for evaluation. Try them on your doc! Compare the resulting Word files to see which, if any, of the tools produces Word files that are satisfactory.

Re the comment about OCR, it's possible that it has already been OCR'ed, since the Neat software supports OCR. But if you did not OCR it during scanning, then you either need to re-scan with OCR or run OCR against the already-scanned document. There are many OCR packages out there — some free, some not free. In fact, your Adobe Acrobat (not Adobe Reader) has OCR — it calls it Recognize Text. There are also scanning/imaging products with built-in OCR. I can make recommendations for all of this, but first let me know if the Neat software has already OCR'ed the document. You can tell by trying to copy the text in Adobe Acrobat or Reader and seeing if you can paste it into a text product, like Notepad or Word. Regards, Joe
0
 
LVL 1

Expert Comment

by:ProTechComputing
ID: 40021751
If your document scan hasn't been OCR'd - you'll need to do that before proceeding with any of these other suggestions.

-IF- the scanned document -HAS- been OCR''d - and if you have Word 2013, you should be able to open and edit the text with Word.

Alternately, if you have Adobe Acrobat X or XI (NOT Acrobat Reader XI), you can open the .pdf and do a "Save As" and save it directly to a Word format (.docx).

All of the preceding suggestions will work as well - but nothing works as well as current software that does what you want/need it to right out of the box.
0
 

Author Comment

by:rgb192
ID: 40023999
Alternately, if you have Adobe Acrobat X or XI (NOT Acrobat Reader XI), you can open the .pdf and do a "Save As" and save it directly to a Word format (.docx).

I tried this for .docx and .rtf but but got two large files around 100mb.
No words to edit, just pictures.
0
 

Author Comment

by:rgb192
ID: 40024004
I would try the other conversion tools, but I already have acrobat XI

alternatively I copy paste all the words into notepad++ and have 9,000 lines of words with no formating.
0
 
LVL 5

Expert Comment

by:Billy Roth
ID: 40024006
Have you tried some of the alternate paste options when you try to paste into word?
0
 
LVL 1

Expert Comment

by:ProTechComputing
ID: 40024008
I can only assume that the pictures you're getting are 'pictures of text'.  

If that's the case - you need to go back to step one and run optical character recognition software to convert those 'pictures of text' into -actual- text.

OCR software can save this actual text in .doc or .pdf format.

MOST OCR software I've used in the past does a poor job of retaining the formatting of scanned text, so you  will probably end up doing a large amount of re-formatting no matter which process you use.
0
 
LVL 5

Expert Comment

by:Billy Roth
ID: 40024011
it must already be ocr'd by neat.com, which is why he can paste it into notepad++.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 1

Expert Comment

by:ProTechComputing
ID: 40024026
it must already be ocr'd by neat.com, which is why he can paste it into notepad++.

If that's the case, he should be able to past it into Word just as easily.  

Again - formatting will remain an issue no matter which path he chooses.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40024540
> alternatively I copy paste all the words into notepad++ and have 9,000 lines of words with no formating.

If you're seeing words (not garbage), then the Neat software is performing OCR, which makes perfect sense, since that is Neat's default. The lack of formatting also makes sense, as that is not Neat's strength. As I mentioned above, in PDF-to-Word conversion, maintaining the formatting/layout is tough stuff! That's why I suggested trying some different tools on your particular document to see which, if any, produces Word files that are satisfactory. Since you already have Acrobat XI, try that on the scanned document:

Open (the scanned doc)
Tools
Recognize Text
In This Document

Even though Neat has already OCR'ed it, Acrobat will OCR it again with its own OCR engine. If you look at Properties, you should see the PDF Producer change from whatever Neat uses to Adobe Acrobat 11. Then save it to Word:

File
Save As
Microsoft Word
Word Document

This may do a decent job of retaining the formatting. But if you're not satisfied, then you'll have to try some of the specialized PDF-to-Word tools already mentioned. You can easily try the online ones yourself — I suggest extracting a few pages from the book to test them before uploading the entire book. If you'd like to post a few pages here (two or three, perhaps), I'll run them through several local tools that I have. Regards, Joe
0
 

Author Comment

by:rgb192
ID: 40026138
If that's the case - you need to go back to step one and run optical character recognition software to convert those 'pictures of text' into -actual- text.

I do not understand

I can copy paste the entire pdf.  I originally thought it was pictures of text, but now I know that it is text.

 
Have you tried some of the alternate paste options when you try to paste into word?
Could you give examples please.
0
 

Author Comment

by:rgb192
ID: 40026139
I suggest extracting a few pages from the book to test them before uploading the entire book. If you'd like to post a few pages here (two or three, perhaps), I'll run them through several local tools that I have. Regards, Joe

how to only export and save a couple pages to show you?

Open (the scanned doc)
Tools
Recognize Text
In This Document

Even though Neat has already OCR'ed it, Acrobat will OCR it again with its own OCR engine. If you look at Properties, you should see the PDF Producer change from whatever Neat uses to Adobe Acrobat 11. Then save it to Word:

acrobat-ocr-engine
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40026353
> how to only export and save a couple pages to show you?

Tools>Pages>Extract

extract pages
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40040888
Are you having any problem creating an extract? If so, let me know and I'll try to help you through it.
0
 

Author Comment

by:rgb192
ID: 40043183
**Copyrighted materials removed by Netminder 5 May 2014**
0
 
LVL 51

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 40043288
You posted the whole book (242 pages). I'm going to request that an EE Admin delete it, as it violates the copyright. I believe that a very small part of it, perhaps 1-3 pages, could be posted under the Fair Use provision, but certainly not the whole book.

I did take just one page and processed it with Nuance Power PDF Advanced. Since you said earlier that the problem is formatting ("I copy paste all the words into notepad++ and have 9,000 lines of words with no formating"), I picked a page that has numerous formatting features — bold centered text, multiple paragraphs, blank lines, and a list of several bullet points. As you can see in the attached Word doc, it was converted very well. Regards, Joe
finalPdf-page14.doc
0
 

Author Closing Comment

by:rgb192
ID: 40043513
I am very sorry about the copy write violation.  I posted the entire file, I meant to post the shortened version.

looks great.

thanks

I have a followup question.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 40043744
Netminder,
Thanks for removing the copyrighted material.

rgb192,
Don't worry about it — mistakes happen. EE Admin took care of it promptly, so all is well. I saw your follow-up question and just replied to it.

Regards, Joe
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Adobe Reader DC will not die 9 68
Workflow help 5 95
PDF Digital Signatures 3 81
Adobe Acrobat Pro DC & Document Cloud 2 70
Microsoft Office Picture Manager was included in Office 2003, 2007, and 2010, but not in Office 2013. Users had hopes that it would be in Office 2016/Office 365, but it is not. Fortunately, the same zero-cost technique that works to install it with …
PaperPort has a feature called the "Send To Bar". It provides a convenient, drag-and-drop interface for using other installed software, such as Microsoft Office. However, this article shows that the latest Office 2016 apps (installed with an Office …
In this first video of the three-part Xpdf series, we introduce and describe Xpdf, a library containing nine command line utilities that perform various functions on PDF files. We show where the library is located and how to download it, discuss its…
Microsoft Office Picture Manager is not included in Office 2013. This comes as quite a surprise to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This video expla…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now