Link to home
Start Free TrialLog in
Avatar of rgb192
rgb192Flag for United States of America

asked on

- end of lines

This question is a followup to
https://www.experts-exchange.com/questions/28429437/delete-page-numbers.html

I scanned in an old book using neat.com ocr.

Many
-
separating words at end of line

example
sep-
arating

which looks good in an old book but not in .doc file

note: file was a .pdf before it became a .doc
but a .doc solution would be easier because I already have the file
I could easily do the acrobat or nuance pdf solution and then convert to .doc
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

If the lines are wrapping to the next line, and Word is automatically hyphenating to neaten up the appearance, then you can remove the option via the Hyphenation dropdown on the Page layout tab.

If, however, the lines are being terminated early by being divided into separate paragraphs, then you might be able to use Find and Replace

Find: -^p
Replace:
(Nothing)

If that doesn't work satisfactorily, can you post a sample document portion please?
Hi rgb192,

The problem is that there's no way to distinguish between the word being hyphenated and being a compound word. For example, look at the one page from your previous question — OCR fixed some of the hyphenations:

sup-port
princi-ples
informa-tion
heal-ing

But it did not fix:

SELF-HEALTH
well-being

It's good that it didn't fix those, because those are compound words and the hyphen should remain. Even if those were spilt across sentences, such as "well-" at the end of one line and "being" at the beginning of the next, the hyphen should remain. So it's a tricky issue. And words like "re-creation" and "recreation" made it even trickier.

Btw, I can't explain why OCR handled "sup-port", "princi-ples", "informa-tion", and "heal-ing" correctly on that page, but did not handle "Smother-ing", "re-currence", "ho-listic", and "be-cause". Regards, Joe
Avatar of rgb192

ASKER

Find: -^p
Replace:
(Nothing)

this seems like a good idea, but which text editor can I use this

and do i copy paste back to microsoft word or nuance pdf?

maybe it can cure
"Smother-ing", "re-currence", "ho-listic", and "be-cause"
That is specific to Microsoft Word.

Have you posted a sample Word document anywhere? If not, it might help to see a bit of what you are dealing with?
Thank you.
Each line is, in Word terms, a paragraph. Also, the hyphenation has introduced a space after the hyphen, so that my suggestion won't work. Instead, put a space after the hyphen in the Find, so that it becomes

Find: - ^p
Replace:
(Still nothing)

I will try to find a way to join the lines of the original paragraphs together so that the text flows as intended (It may need some VBA coding).
Hi Graham,
Attached is a one-page Word doc and the one-page PDF file from which it was created (both posted under Fair Use from a 242-page copyrighted book). It's interesting that the PDF-to-Word conversion program (Nuance's Power PDF Advanced) pieced together some of the hyphenated words but not others. Regards, Joe
finalPdf-page14.doc
finalPdf-page14.pdf
ASKER CERTIFIED SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rgb192

ASKER

Find: ^31
Replace:
(Nothing)

removed some

please tell me if there is another find and replace
or
tell me if that is all that can be done




some-dashes-removed.docx
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rgb192

ASKER

-^31

returned 0 results in some-dashes-removed.docx
using microsoft word 2007
You entered the wrong string. I'll say it again...please read it carefully this time. You should enter:

- ^13

To be clear, that's a normal hyphen (dash) followed by a normal space followed by a carat (Shift-6) and then the number 13.

You forgot the space after the hyphen and you entered 31 instead of 13.
Avatar of rgb192

ASKER

I think all the
-
are gone


thanks
You're welcome. That's great news! Cheers, Joe