Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 507
  • Last Modified:

Fix Line breaks to be spaces from cut and pasted PDF text

We have a situation where a user will paste text (usually from a PDF or a Powerpoint) into a Java application.  So the following is an example of text that was cut and pasted from a PDF:
Screenshot 1Now on the backend it's saved as an xml file and (in notepad) you can see where the text from those linebreaks run together.  Unfortunately this is happening when they are used in other applications.  
Screenshot 2However, if I were to cut and paste that text in notepad to something like an outlook e-mail, you would see the linebreaks from the first screenshot, so they are still there.

What I am wondering is if there is a way to figure out what that character is and make it a space rather than a linebreak.  I have thousands of records to traverse and was hoping for a Perl or Java solution because I already have programs written to traverse the records in those languages.
0
PurpleSlade
Asked:
PurpleSlade
1 Solution
 
CEHJCommented:
You probably need to do the following to give us the best chance of helping:

a. saving first using a FileOutputStream and attaching an example
b. showing the code where you deal with the pasted text
0
 
lwadwellCommented:
To be really sure what character(s) it is, use an editor that converts the values to hex and determine from there.  There are online editors and ones you can download ... I use notepad++ and there is a hex editor plugin for it.
0
 
lwadwellCommented:
Once you know the hex code - you can specify that directly into a regex by the hex value e.g. \x0A for a hex 0A character (linefeed).  If they are standard newline characters - you may be able to use the ubiquitous \n instead.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
tdlewisCommented:
You don't really need to know what the end of line character is. The following statement should replace it with a space.
$pastedText =~ s/\s+/ /g;

Open in new window

0
 
PurpleSladeAuthor Commented:
This works great, thanks.
0
 
CEHJCommented:
Hang on - where did Java come into this - just at the front end?
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now