We have a situation where a user will paste text (usually from a PDF or a Powerpoint) into a Java application. So the following is an example of text that was cut and pasted from a PDF:
Now on the backend it's saved as an xml file and (in notepad) you can see where the text from those linebreaks run together. Unfortunately this is happening when they are used in other applications.
However, if I were to cut and paste that text in notepad to something like an outlook e-mail, you would see the linebreaks from the first screenshot, so they are still there.
What I am wondering is if there is a way to figure out what that character is and make it a space rather than a linebreak. I have thousands of records to traverse and was hoping for a Perl or Java solution because I already have programs written to traverse the records in those languages.