can text information within a PDF be copied from OpenOffice and pasted into WordPress?

Hello and Good Afternoon Everyone,

       I am wondering if I can possibly copy text information from a PDF file within OpenOffice and paste it into WordPress.  

       Thanks

       George
GMartinAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dave BaldwinFixer of ProblemsCommented:
You can certainly copy and paste from Adobe Reader which I would use instead of LibreOffice for that.  At times you may find it to be a tedious process because the text is not necessarily in order.  And although it may appear to be continuous, it always seems to be in blocks.
Dave BaldwinFixer of ProblemsCommented:
I should note also that there are many PDFs that are created with images and you can't copy and paste text from images.
Paul SauvéRetiredCommented:
Can you save the document from OpenOffice in a  format other than pdf? Like ODT or something similar?

If so, post the document (or part of it) here & I will test...
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

GMartinAuthor Commented:
Hi

            The PDF file loads perfectly fine.  While I do see many editing tools on the side bar and at the top of OpenOffice, I am not sure how I can highlight, copy, and paste text.   I am assuming a PDF file can be edited without it being converted to another format.  

             With regards to the ODT format, it is not an option.  However, I do see ODF Drawing and ODF Drawing Template in addition to OpenOffice XML Drawing and OpenOffice XML Drawing Template.  

              On a side note of possible importance, this 8 page document was originally scanned and saved as a PDF file.   While it is entirely text, I am wondering if somehow the document is being interpreted as a picture file by OpenOffice.   Just a thought here though because I could be wrong.

              George
GMartinAuthor Commented:
Hi

           On a side note, I do have Adobe Acrobat Reader DC.  While I can draw a box around text, I am still unable to copy, cut, or delete it.  

            George
Dave BaldwinFixer of ProblemsCommented:
document was originally scanned and saved as a PDF file
Then it is likely an image and not text.  Scanned documents are images that have to be OCR'd to be converted to text.  (OCR = Optical Character Recognition).
GMartinAuthor Commented:
Hi

           I believe this upcoming question is relevant and ties into this post.  If I want to convert the scanned document to an OCR, what can I do to go about achieving this goal with respect to utilities and steps?

           Thanks

           George
Dave BaldwinFixer of ProblemsCommented:
I used this program Tesseract Open Source OCR https://github.com/tesseract-ocr/tesseract though it is a command line program.  I tried a GUI program but it did not perform well.  You will probably have copy and paste the images from the PDF to image files.  I don't think Tesseract Open Source OCR will read the PDF files.
GMartinAuthor Commented:
If you think it would simplify what I am trying to accomplish, should I consider purchasing Adobe Acrobat Pro?  I am willing to invest into a program which can convert my PDF's to OCR's in simple terms and easy to follow steps.  

By the way, once the PDF is converted to an OCR, can it be accessible for editing purposes within MS Word or a web based application such as WordPress?

Thanks

George
Dave BaldwinFixer of ProblemsCommented:
OCR is a program that converts the text in an image into a text file.  Then it can be edited by any text editor or word processor.  I do hope you are not using Microsoft Word to generate HTML files.  It is stuck in 1999 for HTML formatting plus it adds a lot of Microsoft unique formatting items that no one else uses.  Use a plain text or code editor like Notepad++ http://notepad-plus-plus.org/ .
Dave BaldwinFixer of ProblemsCommented:
And... you should expect to almost Always do some hand editing of files that have been converted by OCR.  OCR software is at best imperfect.
Paul SauvéRetiredCommented:
Please have a look at this - this version is free and has OCR included. PDF-XChange Viewer
PDF-XChange Viewer provides a host of useful features to go along with superior functionality in letting you view and mark up PDFs with ease. The software will open PDFs saved on your computer or you can download them directly with the app. Navigating the PDF once its open is straightforward, as well, and there are many navigation and editing tools available on the toolbar across the top of the window.
http://download.cnet.com/PDF-XChange-Viewer/3000-10743_4-10598377.html
GMartinAuthor Commented:
Hello

            I have downloaded and installed it.  At this point, can you provide the necessary steps for converting the text within the PDF as an OCR to be edited?

            Thank you

             George
Paul SauvéRetiredCommented:
PDF-XChange Viewer?

I'll be back in a few!
GMartinAuthor Commented:
Yes, it is PDF-XChange Viewer.  Thank you

George
Paul SauvéRetiredCommented:
Left to right - top to bottom:OCR PDF-XChange Viewer
GMartinAuthor Commented:
Hello and Good Morning

                 Thank you very much for your illustrations given here Paul.  I did download and install PDF-XChange Viewer using your link.   While I have not yet tried out your easy to follow steps provided within the illustrations, I am wondering if I can carry out these steps using the free version.  It seems like there are many tools which are only available within the Pro version.

                   George
Paul SauvéRetiredCommented:
Hi George - In face, I did what I showed above using the free (portable) version.

But I'm afraid I may have led you astray... Please go here to download PDF-XChange Editor  (not PDF-XChange Viewer, as I mentioned above): http://www.tracker-software.com/product/pdf-xchange-editor
Paul SauvéRetiredCommented:
NOTE - I went HERE: http://www.tracker-software.com/product/pdf-xchange-viewer and it says that OCR is a FREE feature - so you are good to go with PDF-XChange Viewer.

Sorry for all this confusion.
Paul SauvéRetiredCommented:
Please forgive my confusion. I did NOT read the information on the site carefully enough. I sometimes tend to get a little distracted (read dyslexic).

I finally figured out what you should be using .

I recommended using PDF-XChange Viewer. In fact, you should uninstall PDF-XChange Viewer and install PDF-XChange Editor, since this has replaced the former.

So, please use this link to download the latest version: http://www.tracker-software.com/product/pdf-xchange-editor

I apologize for these multiple posts and any grief I may have caused you. I'm not usually this mixed up.

Please note that the illustrations above were created using the free download of PDF-XChange Editor (portable version). The advantage of the portable version is that you can put it on a USB stick and use it on any of your PC's/laptops you may have.

Again, so sorry for the mix-up and I hope this final post is clear.
GMartinAuthor Commented:
Hello and Good Evening,

            I successfully downloaded and installed PDF-XChange Editor Free Edition.  Using your illustrations, I was able to use the necessary steps for getting started with converting the PDF document to OCR for letter recognition (Document, then, OCR Pages).   At this point, I now wish to highlight some text and delete it.  Could you possibly give me some guidance on the necessary steps for accomplishing this goal?  

            This is my first time exposure to PDF-XChange Editor.  As such, I am struggling with the basics of editing my PDF file.  Hopefully, this will smoothen out as I become more familiar and experienced with this utility.

             George
Paul SauvéRetiredCommented:
Original question:
>>I am wondering if I can possibly copy text information from a PDF file within OpenOffice and paste it into WordPress.  

Above:
>>  At this point, I now wish to highlight some text and delete it

I really thought the intent was to copy to OpenOffice and paste it into WordPress...

Use the Tools menu (at the top) ―> Basic Tools ―> Select Text Tool.

The cursor will change to the I form and you select text with it. Paste the text to your OpenOffice Write document and do your editing. Once you are happy with the results, copy and paste to WordPress.
Paul SauvéRetiredCommented:
Hi George,

I'm not sure if you realize it, but PDF files are not really for editing. They are a 'printed' electronic version of files that may not be easily read unless the reader has the program with which the original file was created in order to open it.

For example, some people don't have a program to open spreadsheet. So you save it as PDF and send that off. To modify the PDF file, you modify the original and save the modified version as PDF once again.

So you can add comments and so on, but you cant really edit them.

If you are scanning documents to your PC, you should have an option to scan using OCR as well. That way, you don't have to do it after the fact.

I have software with my all-in-one Brother printer that let's me scan directly into MS Word or WordPad...
GMartinAuthor Commented:
Hi Paul

         Thank you Paul for your follow up.   After going through much time trying to figure this out, I have recently come to the same conclusion.  I have a wireless HP Deskjet 2543 printer which has software which allows me to scan it to PDF.  However, that takes me back to my original problem.  At this point, I believe I need to open up a new post requesting recommendations for software which will allow an HP Deskjet 2543 printer to scan directly to MS Office.

          George
Dave BaldwinFixer of ProblemsCommented:
All scans are originally images.  You will need OCR capability to convert documents to text if you want to edit and format them in Microsoft Word.  Some HP devices come with OCR software.
Paul SauvéRetiredCommented:
Hi George,

From your question of 2016-01-10: http://www.experts-exchange.com/questions/28912862/how-can-I-scan-documents-using-my-HP-Deskjet-2543-and-import-them-into-WordPress.html, it seems that you have updated the software & drivers. But there doesn't seem to be an OCR feature with this particular all-in-one printer!

Nevertheless, I may have found a solution:
About FreeOCR
FreeOCR is a free Optical Character Recognition Software for Windows and supports scanning from most Twain scanners and can also open most scanned PDF's and multi page Tiff images as well as popular image file formats. FreeOCR outputs plain text and can export directly to Microsoft Word format.
at http://www.paperfile.net/
OR
SimpleOCR Info
Do you dread having to retype that document you are holding in your hand? If only you had the electronic file, your life would be so much easier. With SimpleOCR, you could easily and accurately convert that paper document into editable electronic text for use in any application including Word and WordPerfect.
at http://simpleocr.com/OCR-Freeware

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GMartinAuthor Commented:
Hello and Good Evening Everyone,

           After trying out a variety of different programs, I found a combination of things which allowed the scanning of text documents to be editable.  Using  Adobe Acrobat Professional XI in conjunction with this nice YouTube training video https://www.youtube.com/watch?v=GsUZb_YOx9o , I am now able to convert my PDF's to MS Word documents for editing purposes.  

            Thanks again everyone for your suggestions and resourceful links.  

            George
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
WordPress

From novice to tech pro — start learning today.