Link to home
Start Free TrialLog in
Avatar of PurpleSlade
PurpleSlade

asked on

Fix Line breaks to be spaces from cut and pasted PDF text

We have a situation where a user will paste text (usually from a PDF or a Powerpoint) into a Java application.  So the following is an example of text that was cut and pasted from a PDF:
User generated imageNow on the backend it's saved as an xml file and (in notepad) you can see where the text from those linebreaks run together.  Unfortunately this is happening when they are used in other applications.  
User generated imageHowever, if I were to cut and paste that text in notepad to something like an outlook e-mail, you would see the linebreaks from the first screenshot, so they are still there.

What I am wondering is if there is a way to figure out what that character is and make it a space rather than a linebreak.  I have thousands of records to traverse and was hoping for a Perl or Java solution because I already have programs written to traverse the records in those languages.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

You probably need to do the following to give us the best chance of helping:

a. saving first using a FileOutputStream and attaching an example
b. showing the code where you deal with the pasted text
To be really sure what character(s) it is, use an editor that converts the values to hex and determine from there.  There are online editors and ones you can download ... I use notepad++ and there is a hex editor plugin for it.
Once you know the hex code - you can specify that directly into a regex by the hex value e.g. \x0A for a hex 0A character (linefeed).  If they are standard newline characters - you may be able to use the ubiquitous \n instead.
ASKER CERTIFIED SOLUTION
Avatar of tdlewis
tdlewis
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PurpleSlade
PurpleSlade

ASKER

This works great, thanks.
Hang on - where did Java come into this - just at the front end?