asked on

Regular expression for CRLF-DOT-CRLF

Hello Guys,

What is the regular expression for splitting the String into tokens taking "\r\n.\r\n" as a token seperator and the seperator should still be there in token

thanks
sudhakar

ozo

/(?<=\r\n\,\r\n)/

ozo

Sorry, typo, I meant
/(?<=\r\n\.\r\n)/

zzynx

This

String parts[] = "abc\r\n.\r\ndef".split("\r\n[.]\r\n");
System.out.println("<" + parts[0] + ">");
System.out.println("<" + parts[1] + ">");

prints out
<abc>
<def>

But I'm not sure about your requirement:
>> the seperator should still be there in token

tomboshell

You don't even need a regular expression for that. A StringTokenizer is what you need, and the tokenizer is faster. Just set it to return the token.

StringTokenizer sTok = new StringTokenizer(string, "\r\n.\r\n", true);

sudhakar_koundinya

ASKER

zzynx,

the first part is Okay with me. I am stucked at second point only,

I like to explain the scenario here

We are working on Java -MS Exchange Server Communication Project where I get the emails as xml files. There is a possibilty of getting multiple emails as a single xml file. now using Xalan we are able to transform this xml file into RFC 822/2822 standard emails .

The output what we get here is something similar to outlook express file.

So to split entire transformmed content into individual emails I need to take "\r\n.\r\n" as seperator. But this should be still there in token after splitting the content.

Hope you understand. If not let me know, I will try to explain the our operations using sample examples

Regards
Sudhakar

zzynx

Of course but
1) the author asked to use split()
2) I quote from the API docs:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code.
It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

zzynx

>> But this should be still there in token after splitting the content.
Of course, you could add the token afterwards, no?

String parts[] = "abc\r\n.\r\ndef".split("\r\n[.]\r\n");
for (int i=0; i<parts.length-1; i++)
parts[i] += "\r\n.\r\n";
System.out.println("<" + parts[0] + ">");
System.out.println("<" + parts[1] + ">");

sudhakar_koundinya

ASKER

Usage of StringTokenizer is restricted in current project :(

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

zzynx

That last one seems to be very right.
Do you also know the explanation of that regular expression ozo?

SOLUTION