Link to home
Start Free TrialLog in
Avatar of sudhakar_koundinya
sudhakar_koundinya

asked on

Regular expression for CRLF-DOT-CRLF

Hello Guys,

What is the regular expression for splitting the String into tokens taking "\r\n.\r\n" as a token seperator and the seperator should still be there in token

thanks
sudhakar
Avatar of ozo
ozo
Flag of United States of America image

/(?<=\r\n\,\r\n)/
Sorry, typo, I meant
/(?<=\r\n\.\r\n)/
This

        String parts[] = "abc\r\n.\r\ndef".split("\r\n[.]\r\n");
        System.out.println("<" + parts[0] + ">");
        System.out.println("<" + parts[1] + ">");

prints out
<abc>
<def>

But I'm not sure about your requirement:
>> the seperator should still be there in token
Avatar of tomboshell
tomboshell

You don't even need a regular expression for that.   A StringTokenizer is what you need,  and the tokenizer is faster.  Just set it to return the token.

StringTokenizer sTok = new StringTokenizer(string, "\r\n.\r\n", true);
 
Avatar of sudhakar_koundinya

ASKER

zzynx,

the first part is Okay with me. I am stucked at second point only,

I like to explain the scenario here

We are working on Java -MS Exchange Server Communication Project where I get the emails as xml files. There is a possibilty of getting multiple emails as a single xml file. now using Xalan we are able to transform this xml file into RFC 822/2822 standard emails .

The output what we get here is something similar to outlook express file.

So to split entire transformmed content into individual emails I need to take "\r\n.\r\n" as seperator. But this should be still there in token after splitting the content.

Hope you understand. If not let me know, I will try to explain the our operations using sample examples

Regards
Sudhakar


Of course but
1) the author asked to use split()
2) I quote from the API docs:
    StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code.
    It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
>> But this should be still there in token after splitting the content.
Of course, you could add the token afterwards, no?

        String parts[] = "abc\r\n.\r\ndef".split("\r\n[.]\r\n");
        for (int i=0; i<parts.length-1; i++)
           parts[i] += "\r\n.\r\n";
        System.out.println("<" + parts[0] + ">");
        System.out.println("<" + parts[1] + ">");
Usage of StringTokenizer is restricted in current project :(
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That last one seems to be very right.
Do you also know the explanation of that regular expression ozo?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial