Link to home
Start Free TrialLog in
Avatar of PearlJamFanatic
PearlJamFanatic

asked on

split string containing \r\n in Java

I am using the below code to read from socket
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
								
while ((fromAB = in.readLine()) != null) {	
switch (fromAB.trim().toUpperCase().split("\\|")[1].split("\\")[0]){
case "SHORT_ENTRY":	
.
.
.
}
}

Open in new window


But this fails to remove \r\n

The following is read from the socket

ESZ16|SHORT_ENTRY\r\n

Open in new window


I want "SHORT_ENTRY" after the split.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

But this fails to remove \r\n
There won't be any such characters. They will already have been removed by BufferedReader.readLine.

Also here
switch (fromAB.trim().toUpperCase().split("\\|")[1].split("\\")[0]){

Open in new window

the second regex doesn't make any sense. The first alone should get you SHORT_ENTRY if the line is as you say it is (it WON'T have line feeds though)
Avatar of Shane Krueger
Shane Krueger

Try changing  \\   to \\\\   -- escaping it for the string, then for the regex

switch (fromAB.trim().toUpperCase().split("\\|")[1].split("\\\\")[0]){
Avatar of PearlJamFanatic

ASKER

I am writing from VBScript. I use a library called chilkat. This is the VB code.
success = socket.Connect("localhost", AFL("PortNumber"), ssl, maxWaitMillisec)   
If (success <> 1) Then
    outFile.WriteLine(socket.LastErrorText)
    WScript.Quit
End if

success = socket.SendString(AFL("MWSymbol")&"|"&message&"\r\n")

Open in new window


I was told to use \r\n when using readline() in java. When I remove \r\n from the vb message it works. I don't know what is happening. Is \r\n not needed? Is it possible that VB is not sending carriage return and newline instead it is sending literal '\r\n'. I am really confused.
I was told to use \r\n when using readline() in java.
I don't really know what that means. readLine() reads a line of text. A line of text is determined by line feed delimiters. It does NOT return the delimiters as well
ASKER CERTIFIED SOLUTION
Avatar of Shane Krueger
Shane Krueger

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, I should have read your code again.  You've already created a buffer, and in.ReadLine is what separates each line from the next.  (Assuming of course that you don't recreate the buffer upon each read).  I don't know java, but in .NET ReadLine would automatically strip the CRLF from the end of the string.  You will need to change your VBScript to use vbCrLf instead of "\r\n", however.
If there is a literal  \r\n contained in the message [ there probably shouldn't be if there is ;) ] then the situation becomes different
Ok, so i tried &vbCrLf. My VBScript code read as below

success = socket.SendString(AFL("MWSymbol")&"|"&message&vbCrLF)

Open in new window


And my Java code reads as below

	while ((fromAB = in.readLine()) != null) {	
	scenario=fromAB.trim().toUpperCase().split("\\|")[1];	
	switch (scenario){
                                          .
                                          .
                                          .
                                      }

Open in new window


This is working. So the vbCrLf is consumed, I suppose, in Java by in.readLine() and I am not required to process it. Shane, then how do i scan for it in java?
while (!t.isInterrupted()) {
								Socket connection = socket.accept();

								String fromAB = "";
								BufferedReader in = new BufferedReader(
										new InputStreamReader(connection.getInputStream()));
								
						
								while ((fromAB = in.readLine()) != null) {	
									
									scenario=fromAB.trim().toUpperCase().split("\\|")[1];
									
									switch (scenario){
																
									case "LONG_ENTRY":


}}

Open in new window


Is the buffer getting recreated? It think it is. because after reading a message BufferReader in would contain null and in the next iteration while loop would exit which would mean buffer is recreated? is that right?
then how do i scan for it in java?
Why would you want to anyway? That's just marking the end of the line with a carriage return linefeed pair ...
well i don' want to scan for it. Shane said he would scan for it. thats why i asked.
Is the buffer getting recreated?
Which buffer? I mean there IS a buffer behind the scenes, but not one that you need to worry about.
in.ReadLine scans for it and parses based on it - it does everything you need it to
Sorry I didn't make that clear - but with the last code sample, you should be golden.
what happens when a message is consumed by readline()? what will in.readLine() return in the next iteration? will it return null?
Why would it be 'consumed'? The only thing that gets confused is linefeeds
so are you saying all messages remain in the BufferReader? even the ones that have been read
In .NET, Readline will block and wait until another CRLF is received, however long that is.  (I'm assuming java is the same.)  Then it will return the data between the last CRLF and this one.  So this would be an example of the java program's operation:

> execution occurs until Readline
> Readline freezes operation and waits for data
> receives ESZ16|SHOR
> keeps waiting, as no CRLF has been received yet
> receives T_ENTRY\r\n   (assuming that \r\n is a carriage return and line feed, not actually \r\n)
> Readline returns ESZ16|SHORT_ENTRY    (note that the CRLF is gone)
> your code executes, parsing the SHORT_ENTRY string from the rest
> your code goes back to Readline, which starts waiting again
> this time ESZ16|ENTRY_2\r\nESZ16|ENTRY_3\r\n is received
> Readline returns ESZ16|ENTRY_2 but leaves the rest in the buffer   (again, note that the CRLF is gone)
> your code executes, parsing ENTRY_2
> your code goes back to Readline
> Readline does not wait at all, but can see that ESZ16|ENTRY_3\r\n is still in the buffer
> Readline immediately returns ESZ16|ENTRY_3   (again, note that the CRLF is gone)
> your code executes, parsing ENTRY_3
> your code goes back to Readline
> Readline waits for more data...

With the buffer, broken or delayed messages all get parsed correctly, and in the correct order.
The only thing that gets confused is linefeeds
Of course, that should have said 'consumed' ;)

> receives ESZ16|SHOR
> keeps waiting, as no CRLF has been received yet
> receives T_ENTRY\r\n   (assuming that \r\n is a carriage return and line feed, not actually \r\n)
my java code is not waiting for vbCrLf. It parses through. Why?
For those of you who seem to be confused by the word consumed: it depends on how you look at it.  The underlying code behind the readLine method operates in this general fashion:

read data from the socket into a buffer
look for CR, LF, or CRLF
if there is any,
    save to a temp variable
    delete it from the buffer, along with the CR/LF/CRLF
    return it from the function, without the CR/LF/CRLF

Open in new window


I was indicating that the returned string gets consumed from the underlying buffer, along with the CRLF, which is not returned.

From a higher level, you could say that the string is returned and the CRLF is consumed.
while ((fromAB = in.readLine()) != null)

Open in new window


when will in.readLine() return null. can someone explain with an example?
It will return null when the end of the stream is reached
shane I don't think readline() is blocking and waiting for \r\n in java. My code is working with or without the vbCrLf. and that's what makes me thoroughly confused.
what constitutes end of stream?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

what constitutes end of stream?
The end of the stream is reached (for instance) when the sender closes the connection
shane I don't think readline() is blocking and waiting for \r\n in java. My code is working with or without the vbCrLf. and that's what makes me thoroughly confused.

Then my best guess is that sendstring is appending a CRLF automatically....which seems unusual.
But any of the 4 ideas I had above still could apply...
Yes the socket is closed from vbscript after every send.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That's why.  readLine won't wait any more once the socket is closed, and just returns whatever is remaining in the buffer.
No need to change your code, however, as long as you understand that.  Your code will still reconstruct a fragmented TCP packet -- although in practice that will never happen.
As a best-practice measure, I recommend leaving the vbCrLf in the code

Also, if you were to send two messages at the same time before closing the socket from vbscript, they would still both parse correctly so long as the vbCrLf was in there.
Yes, I will append vbCrLf to my message. One question though can the end of stream (socket.close) signal reach java code before the full message is received? You are saying that TCP messages can sometimes be fragmented. so is this a possibility?
An extra vbCrLf will only result in an empty String getting produced in Java - not a problem if you allow for that
An extra vbCrLf will only result in an empty String getting produced in Java - not a problem if you allow for that
no not extra just the one.
would you recommend keeping the socket connection open from vbscript?
Actually, not typically.  Since the buffer will be empty when the socket is closed, readLine shouldn't return an empty string, but just return null to indicate that the socket has been closed.  It should only return the remaining data if there's any leftover in the buffer to be returned.
Hard to say - it depends.  Typically I would say yes.  If you could explain how this would be used, I could give a better suggestion.
What i fear is that end of stream (socket.close) should not reach java code before the complete message? In TCP anything can happen. right?
Well I am using it for interprocess communication between a vbscript application and a java application. A message will typically be sent every 2-3 minutes.
Below is the full vb code. This function is called every 2-3 minutes.
Sub SendToMWVB(message)
Dim fso, outFile
Set fso = CreateObject("Scripting.FileSystemObject")
Set outFile = fso.CreateTextFile("output.txt", True)

set socket = CreateObject("Chilkat_9_5_0.Socket")

success = socket.UnlockComponent("Anything for 30-day trial")
If (success <> 1) Then
    outFile.WriteLine(socket.LastErrorText)
    WScript.Quit
End If

'  Connect to port 5555 of localhost.
'  The string "localhost" is for testing on a single computer.
'  It would typically be replaced with an IP hostname, such
'  as "www.chilkatsoft.com".
ssl = 0
maxWaitMillisec = 20000
'MsgBox AFL("PortNumber")&" "&AFL("MWSymbol")&"|"&AFL("global_message_to_send")&"\r\n"
success = socket.Connect("localhost", AFL("PortNumber"), ssl, maxWaitMillisec)   
If (success <> 1) Then
    outFile.WriteLine(socket.LastErrorText)
    WScript.Quit
End if

success = socket.SendString(AFL("MWSymbol")&"|"&message&vbCrLF)
'&"\r\n")
If (success <> 1) Then
    outFile.WriteLine(socket.LastErrorText)
    WScript.Quit
End If
socket.Close(20000)  
'SendToMWVB=success
End sub

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thx
Well I am using it for interprocess communication between a vbscript application and a java application. A message will typically be sent every 2-3 minutes.

As the old saying goes, "If it ain't broke, don't fix it!

Bearing that in mind, generally I would not close the connection between transmissions.  That way you can monitor for a disconnection of the stream - like if someone closed one of the two applications.  It also means you need to repair the connection in case it fails, or notify the user that the connection failed.  A good example would be if it was an interactive game.  There's a major problem if the connection fails -- you'd want to know (on both ends), and can't just ignore it.  

If it's just a notification, to which timeliness or dropped packets doesn't matter so much, then leave as-is.  So if this code monitors temperature and is posting updates to a server, it doesn't matter much if a few packets are dropped.  The code as-is will pretty much self-heal in case one of the programs is closed or crashes.  So it would be better this way.

I guess the biggest difference is that the java side can't monitor for a disconnection if the stream is closed after every transmission.  You'd need to write a timer or something.  Secondly, I wouldn't want hundreds or thousands of ports being opened and closed every hour.  Seems like 'bad code' even if it works correctly.  And I don't like to write 'bad code'.

You might actually consider UDP.  If it's just a notification where guaranteed delivery is not required, then UDP is a much better choice.  Only a single packet is sent every time you send an update - rather than the dozen or so going back and forth as it is now.  But then unless you manually code a UDP response, you'll have absolutely no idea if the message went through or not.