Fix Line breaks to be spaces from cut and pasted PDF text

Posted on 2012-08-22
Last Modified: 2012-08-23
We have a situation where a user will paste text (usually from a PDF or a Powerpoint) into a Java application.  So the following is an example of text that was cut and pasted from a PDF:
Screenshot 1Now on the backend it's saved as an xml file and (in notepad) you can see where the text from those linebreaks run together.  Unfortunately this is happening when they are used in other applications.  
Screenshot 2However, if I were to cut and paste that text in notepad to something like an outlook e-mail, you would see the linebreaks from the first screenshot, so they are still there.

What I am wondering is if there is a way to figure out what that character is and make it a space rather than a linebreak.  I have thousands of records to traverse and was hoping for a Perl or Java solution because I already have programs written to traverse the records in those languages.
Question by:PurpleSlade
    LVL 86

    Expert Comment

    You probably need to do the following to give us the best chance of helping:

    a. saving first using a FileOutputStream and attaching an example
    b. showing the code where you deal with the pasted text
    LVL 25

    Expert Comment

    To be really sure what character(s) it is, use an editor that converts the values to hex and determine from there.  There are online editors and ones you can download ... I use notepad++ and there is a hex editor plugin for it.
    LVL 25

    Expert Comment

    Once you know the hex code - you can specify that directly into a regex by the hex value e.g. \x0A for a hex 0A character (linefeed).  If they are standard newline characters - you may be able to use the ubiquitous \n instead.
    LVL 10

    Accepted Solution

    You don't really need to know what the end of line character is. The following statement should replace it with a space.
    $pastedText =~ s/\s+/ /g;

    Open in new window

    LVL 2

    Author Closing Comment

    This works great, thanks.
    LVL 86

    Expert Comment

    Hang on - where did Java come into this - just at the front end?

    Featured Post

    Highfive Gives IT Their Time Back

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    how to disable hibernate query cache 2 53
    copyEvens challenge 6 47
    Increment alphanumeric sequence 6 57
    pairs challenge 5 31
    There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
    In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
    Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
    The viewer will learn how to implement Singleton Design Pattern in Java.

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now