PurpleSlade
asked on
Fix Line breaks to be spaces from cut and pasted PDF text
We have a situation where a user will paste text (usually from a PDF or a Powerpoint) into a Java application. So the following is an example of text that was cut and pasted from a PDF:
Now on the backend it's saved as an xml file and (in notepad) you can see where the text from those linebreaks run together. Unfortunately this is happening when they are used in other applications.
However, if I were to cut and paste that text in notepad to something like an outlook e-mail, you would see the linebreaks from the first screenshot, so they are still there.
What I am wondering is if there is a way to figure out what that character is and make it a space rather than a linebreak. I have thousands of records to traverse and was hoping for a Perl or Java solution because I already have programs written to traverse the records in those languages.
Now on the backend it's saved as an xml file and (in notepad) you can see where the text from those linebreaks run together. Unfortunately this is happening when they are used in other applications.
However, if I were to cut and paste that text in notepad to something like an outlook e-mail, you would see the linebreaks from the first screenshot, so they are still there.
What I am wondering is if there is a way to figure out what that character is and make it a space rather than a linebreak. I have thousands of records to traverse and was hoping for a Perl or Java solution because I already have programs written to traverse the records in those languages.
To be really sure what character(s) it is, use an editor that converts the values to hex and determine from there. There are online editors and ones you can download ... I use notepad++ and there is a hex editor plugin for it.
Once you know the hex code - you can specify that directly into a regex by the hex value e.g. \x0A for a hex 0A character (linefeed). If they are standard newline characters - you may be able to use the ubiquitous \n instead.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This works great, thanks.
Hang on - where did Java come into this - just at the front end?
a. saving first using a FileOutputStream and attaching an example
b. showing the code where you deal with the pasted text