PearlJamFanatic
asked on
split string containing \r\n in Java
I am using the below code to read from socket
But this fails to remove \r\n
The following is read from the socket
I want "SHORT_ENTRY" after the split.
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while ((fromAB = in.readLine()) != null) {
switch (fromAB.trim().toUpperCase().split("\\|")[1].split("\\")[0]){
case "SHORT_ENTRY":
.
.
.
}
}
But this fails to remove \r\n
The following is read from the socket
ESZ16|SHORT_ENTRY\r\n
I want "SHORT_ENTRY" after the split.
Try changing \\ to \\\\ -- escaping it for the string, then for the regex
switch (fromAB.trim().toUpperCase ().split(" \\|")[1].s plit("\\\\ ")[0]){
switch (fromAB.trim().toUpperCase
ASKER
I am writing from VBScript. I use a library called chilkat. This is the VB code.
I was told to use \r\n when using readline() in java. When I remove \r\n from the vb message it works. I don't know what is happening. Is \r\n not needed? Is it possible that VB is not sending carriage return and newline instead it is sending literal '\r\n'. I am really confused.
success = socket.Connect("localhost", AFL("PortNumber"), ssl, maxWaitMillisec)
If (success <> 1) Then
outFile.WriteLine(socket.LastErrorText)
WScript.Quit
End if
success = socket.SendString(AFL("MWSymbol")&"|"&message&"\r\n")
I was told to use \r\n when using readline() in java. When I remove \r\n from the vb message it works. I don't know what is happening. Is \r\n not needed? Is it possible that VB is not sending carriage return and newline instead it is sending literal '\r\n'. I am really confused.
I was told to use \r\n when using readline() in java.I don't really know what that means. readLine() reads a line of text. A line of text is determined by line feed delimiters. It does NOT return the delimiters as well
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Sorry, I should have read your code again. You've already created a buffer, and in.ReadLine is what separates each line from the next. (Assuming of course that you don't recreate the buffer upon each read). I don't know java, but in .NET ReadLine would automatically strip the CRLF from the end of the string. You will need to change your VBScript to use vbCrLf instead of "\r\n", however.
If there is a literal \r\n contained in the message [ there probably shouldn't be if there is ;) ] then the situation becomes different
ASKER
Ok, so i tried &vbCrLf. My VBScript code read as below
And my Java code reads as below
This is working. So the vbCrLf is consumed, I suppose, in Java by in.readLine() and I am not required to process it. Shane, then how do i scan for it in java?
success = socket.SendString(AFL("MWSymbol")&"|"&message&vbCrLF)
And my Java code reads as below
while ((fromAB = in.readLine()) != null) {
scenario=fromAB.trim().toUpperCase().split("\\|")[1];
switch (scenario){
.
.
.
}
This is working. So the vbCrLf is consumed, I suppose, in Java by in.readLine() and I am not required to process it. Shane, then how do i scan for it in java?
ASKER
while (!t.isInterrupted()) {
Socket connection = socket.accept();
String fromAB = "";
BufferedReader in = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
while ((fromAB = in.readLine()) != null) {
scenario=fromAB.trim().toUpperCase().split("\\|")[1];
switch (scenario){
case "LONG_ENTRY":
}}
Is the buffer getting recreated? It think it is. because after reading a message BufferReader in would contain null and in the next iteration while loop would exit which would mean buffer is recreated? is that right?
then how do i scan for it in java?Why would you want to anyway? That's just marking the end of the line with a carriage return linefeed pair ...
ASKER
well i don' want to scan for it. Shane said he would scan for it. thats why i asked.
Is the buffer getting recreated?Which buffer? I mean there IS a buffer behind the scenes, but not one that you need to worry about.
in.ReadLine scans for it and parses based on it - it does everything you need it to
Sorry I didn't make that clear - but with the last code sample, you should be golden.
ASKER
what happens when a message is consumed by readline()? what will in.readLine() return in the next iteration? will it return null?
Why would it be 'consumed'? The only thing that gets confused is linefeeds
ASKER
so are you saying all messages remain in the BufferReader? even the ones that have been read
In .NET, Readline will block and wait until another CRLF is received, however long that is. (I'm assuming java is the same.) Then it will return the data between the last CRLF and this one. So this would be an example of the java program's operation:
> execution occurs until Readline
> Readline freezes operation and waits for data
> receives ESZ16|SHOR
> keeps waiting, as no CRLF has been received yet
> receives T_ENTRY\r\n (assuming that \r\n is a carriage return and line feed, not actually \r\n)
> Readline returns ESZ16|SHORT_ENTRY (note that the CRLF is gone)
> your code executes, parsing the SHORT_ENTRY string from the rest
> your code goes back to Readline, which starts waiting again
> this time ESZ16|ENTRY_2\r\nESZ16|ENT RY_3\r\n is received
> Readline returns ESZ16|ENTRY_2 but leaves the rest in the buffer (again, note that the CRLF is gone)
> your code executes, parsing ENTRY_2
> your code goes back to Readline
> Readline does not wait at all, but can see that ESZ16|ENTRY_3\r\n is still in the buffer
> Readline immediately returns ESZ16|ENTRY_3 (again, note that the CRLF is gone)
> your code executes, parsing ENTRY_3
> your code goes back to Readline
> Readline waits for more data...
With the buffer, broken or delayed messages all get parsed correctly, and in the correct order.
> execution occurs until Readline
> Readline freezes operation and waits for data
> receives ESZ16|SHOR
> keeps waiting, as no CRLF has been received yet
> receives T_ENTRY\r\n (assuming that \r\n is a carriage return and line feed, not actually \r\n)
> Readline returns ESZ16|SHORT_ENTRY (note that the CRLF is gone)
> your code executes, parsing the SHORT_ENTRY string from the rest
> your code goes back to Readline, which starts waiting again
> this time ESZ16|ENTRY_2\r\nESZ16|ENT
> Readline returns ESZ16|ENTRY_2 but leaves the rest in the buffer (again, note that the CRLF is gone)
> your code executes, parsing ENTRY_2
> your code goes back to Readline
> Readline does not wait at all, but can see that ESZ16|ENTRY_3\r\n is still in the buffer
> Readline immediately returns ESZ16|ENTRY_3 (again, note that the CRLF is gone)
> your code executes, parsing ENTRY_3
> your code goes back to Readline
> Readline waits for more data...
With the buffer, broken or delayed messages all get parsed correctly, and in the correct order.
The only thing that gets confused is linefeedsOf course, that should have said 'consumed' ;)
ASKER
my java code is not waiting for vbCrLf. It parses through. Why?
> receives ESZ16|SHOR
> keeps waiting, as no CRLF has been received yet
> receives T_ENTRY\r\n (assuming that \r\n is a carriage return and line feed, not actually \r\n)
For those of you who seem to be confused by the word consumed: it depends on how you look at it. The underlying code behind the readLine method operates in this general fashion:
I was indicating that the returned string gets consumed from the underlying buffer, along with the CRLF, which is not returned.
From a higher level, you could say that the string is returned and the CRLF is consumed.
read data from the socket into a buffer
look for CR, LF, or CRLF
if there is any,
save to a temp variable
delete it from the buffer, along with the CR/LF/CRLF
return it from the function, without the CR/LF/CRLF
I was indicating that the returned string gets consumed from the underlying buffer, along with the CRLF, which is not returned.
From a higher level, you could say that the string is returned and the CRLF is consumed.
ASKER
while ((fromAB = in.readLine()) != null)
when will in.readLine() return null. can someone explain with an example?
It will return null when the end of the stream is reached
ASKER
shane I don't think readline() is blocking and waiting for \r\n in java. My code is working with or without the vbCrLf. and that's what makes me thoroughly confused.
ASKER
what constitutes end of stream?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
The end of the stream is reached (for instance) when the sender closes the connection
what constitutes end of stream?
shane I don't think readline() is blocking and waiting for \r\n in java. My code is working with or without the vbCrLf. and that's what makes me thoroughly confused.
Then my best guess is that sendstring is appending a CRLF automatically....which seems unusual.
But any of the 4 ideas I had above still could apply...
ASKER
Yes the socket is closed from vbscript after every send.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
That's why. readLine won't wait any more once the socket is closed, and just returns whatever is remaining in the buffer.
No need to change your code, however, as long as you understand that. Your code will still reconstruct a fragmented TCP packet -- although in practice that will never happen.
As a best-practice measure, I recommend leaving the vbCrLf in the code
Also, if you were to send two messages at the same time before closing the socket from vbscript, they would still both parse correctly so long as the vbCrLf was in there.
Also, if you were to send two messages at the same time before closing the socket from vbscript, they would still both parse correctly so long as the vbCrLf was in there.
ASKER
Yes, I will append vbCrLf to my message. One question though can the end of stream (socket.close) signal reach java code before the full message is received? You are saying that TCP messages can sometimes be fragmented. so is this a possibility?
An extra vbCrLf will only result in an empty String getting produced in Java - not a problem if you allow for that
ASKER
An extra vbCrLf will only result in an empty String getting produced in Java - not a problem if you allow for thatno not extra just the one.
ASKER
would you recommend keeping the socket connection open from vbscript?
Actually, not typically. Since the buffer will be empty when the socket is closed, readLine shouldn't return an empty string, but just return null to indicate that the socket has been closed. It should only return the remaining data if there's any leftover in the buffer to be returned.
Hard to say - it depends. Typically I would say yes. If you could explain how this would be used, I could give a better suggestion.
ASKER
What i fear is that end of stream (socket.close) should not reach java code before the complete message? In TCP anything can happen. right?
ASKER
Well I am using it for interprocess communication between a vbscript application and a java application. A message will typically be sent every 2-3 minutes.
ASKER
Below is the full vb code. This function is called every 2-3 minutes.
Sub SendToMWVB(message)
Dim fso, outFile
Set fso = CreateObject("Scripting.FileSystemObject")
Set outFile = fso.CreateTextFile("output.txt", True)
set socket = CreateObject("Chilkat_9_5_0.Socket")
success = socket.UnlockComponent("Anything for 30-day trial")
If (success <> 1) Then
outFile.WriteLine(socket.LastErrorText)
WScript.Quit
End If
' Connect to port 5555 of localhost.
' The string "localhost" is for testing on a single computer.
' It would typically be replaced with an IP hostname, such
' as "www.chilkatsoft.com".
ssl = 0
maxWaitMillisec = 20000
'MsgBox AFL("PortNumber")&" "&AFL("MWSymbol")&"|"&AFL("global_message_to_send")&"\r\n"
success = socket.Connect("localhost", AFL("PortNumber"), ssl, maxWaitMillisec)
If (success <> 1) Then
outFile.WriteLine(socket.LastErrorText)
WScript.Quit
End if
success = socket.SendString(AFL("MWSymbol")&"|"&message&vbCrLF)
'&"\r\n")
If (success <> 1) Then
outFile.WriteLine(socket.LastErrorText)
WScript.Quit
End If
socket.Close(20000)
'SendToMWVB=success
End sub
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thx
Well I am using it for interprocess communication between a vbscript application and a java application. A message will typically be sent every 2-3 minutes.
As the old saying goes, "If it ain't broke, don't fix it!
Bearing that in mind, generally I would not close the connection between transmissions. That way you can monitor for a disconnection of the stream - like if someone closed one of the two applications. It also means you need to repair the connection in case it fails, or notify the user that the connection failed. A good example would be if it was an interactive game. There's a major problem if the connection fails -- you'd want to know (on both ends), and can't just ignore it.
If it's just a notification, to which timeliness or dropped packets doesn't matter so much, then leave as-is. So if this code monitors temperature and is posting updates to a server, it doesn't matter much if a few packets are dropped. The code as-is will pretty much self-heal in case one of the programs is closed or crashes. So it would be better this way.
I guess the biggest difference is that the java side can't monitor for a disconnection if the stream is closed after every transmission. You'd need to write a timer or something. Secondly, I wouldn't want hundreds or thousands of ports being opened and closed every hour. Seems like 'bad code' even if it works correctly. And I don't like to write 'bad code'.
You might actually consider UDP. If it's just a notification where guaranteed delivery is not required, then UDP is a much better choice. Only a single packet is sent every time you send an update - rather than the dozen or so going back and forth as it is now. But then unless you manually code a UDP response, you'll have absolutely no idea if the message went through or not.
Also here
the second regex doesn't make any sense. The first alone should get you SHORT_ENTRY if the line is as you say it is (it WON'T have line feeds though)