mbormann
asked on
need help in fast parsing
hello,
I have got a file under 100KB guarenteed and I have got to pick up parts of it and store some parts,lets say structure is
START=TYPE1=BLAH
...
...
END=TYPE1=BLAH
...
...
START=TYPE2=BLAH
...
...
END=TYPE2=BLAH
It is always guaranteed that one type of block is contiguos i.e there will be no overlapping START,END blocks.
But there are millions of 'StART=' blocks
I have done using simple indexOf() and also teh Boyer-Moore algorithm from http://mindprod.com/products.html
search on keyword 'search' :-)
But i finally think that I will have to use something like this
http://www.cs.princeton.edu/~appel/modern/java/JLex/
Now has anybody used something like this?I think this is what Lex and Yacc r abt.
I was thinking of searching for these 'START=' using this RegEx stuff and then sometimes depending on some conditions, I have store this chunk of data between the delimiters in a Data Structure in Memory.
Right now I have done hard coding and storing then as a String in a pre defined string Array.
I feel that am going astray and would like some advice before going completely bonkers.
Can't use readLine() since I have to pick up the data and store it for later modification
Any suggestions in how to implement using the regex stuff would be welcome
is this of any help?
http://www.quateams.com/oro/software/OROMatcher1.1.html
the exact structure of record is something like this with attached algorithm using this regex code AND sample data file.
basically I have a record like this
START=TYPE1=200=2 ---->header
.......
.......
....... -----> body
.......
END=TYPE1=200=2 ------> footer
which gives me output like
'START=TYPE1=' only doesnt give me the '200=2' at end
my code will go like
while(matcher doesnt rech end of file)
{
(1)look at header of each record ,a sort of look ahead
(2) if(Type 1)
{
if(body also present)
{
pick up whole record & process
}
else
{
discard record
}
}
else if(Type 2)
{
if(body also present)
{
pick up whole record & process
}
else
{
discard record
}
}
else if (type 3)
{
if(body also present)
{
pick up whole record & process
}
else
{
discard record
}
}
...and so on
}
Sample data file is
START=type1=200=0
END=type1=200=0
START=type1=200=0
Name=mickey
data=0xQv4gBjAAEIQwoAAQIAA AABABABqqq qqqqqqqqqq qqqqqqqIAA AAQEIATMzM zMzMzMAAAA
END=type1=200=0
START=type1=200=1
END=type1=200=1
START=type1=200=1
Name=GueesssMan
data=WL8vsAABCEMKAAECAAAAU QAQUaqqqqq qqqqqqqqqq qqqqiAAAFE BCFEzMzMzM zMzAAAAUQI
Q
END=type1=200=1
START=type2=200=0
Name=maximumsecurity
END=type2=200=0
START=type1=200=2
END=type1=200=2
START=type1=200=2
Name=mickey2
data=0xQv4gBjAAEIQwoAAQIAA AABABABqqq qqqqqqqqqq qqqqqqqIAA AAQEIATMzM zMzMzMAAAA
END=type1=200=2
START=type2=200=1
Name=conn2
END=type2=200=1
I have got a file under 100KB guarenteed and I have got to pick up parts of it and store some parts,lets say structure is
START=TYPE1=BLAH
...
...
END=TYPE1=BLAH
...
...
START=TYPE2=BLAH
...
...
END=TYPE2=BLAH
It is always guaranteed that one type of block is contiguos i.e there will be no overlapping START,END blocks.
But there are millions of 'StART=' blocks
I have done using simple indexOf() and also teh Boyer-Moore algorithm from http://mindprod.com/products.html
search on keyword 'search' :-)
But i finally think that I will have to use something like this
http://www.cs.princeton.edu/~appel/modern/java/JLex/
Now has anybody used something like this?I think this is what Lex and Yacc r abt.
I was thinking of searching for these 'START=' using this RegEx stuff and then sometimes depending on some conditions, I have store this chunk of data between the delimiters in a Data Structure in Memory.
Right now I have done hard coding and storing then as a String in a pre defined string Array.
I feel that am going astray and would like some advice before going completely bonkers.
Can't use readLine() since I have to pick up the data and store it for later modification
Any suggestions in how to implement using the regex stuff would be welcome
is this of any help?
http://www.quateams.com/oro/software/OROMatcher1.1.html
the exact structure of record is something like this with attached algorithm using this regex code AND sample data file.
basically I have a record like this
START=TYPE1=200=2 ---->header
.......
.......
....... -----> body
.......
END=TYPE1=200=2 ------> footer
which gives me output like
'START=TYPE1=' only doesnt give me the '200=2' at end
my code will go like
while(matcher doesnt rech end of file)
{
(1)look at header of each record ,a sort of look ahead
(2) if(Type 1)
{
if(body also present)
{
pick up whole record & process
}
else
{
discard record
}
}
else if(Type 2)
{
if(body also present)
{
pick up whole record & process
}
else
{
discard record
}
}
else if (type 3)
{
if(body also present)
{
pick up whole record & process
}
else
{
discard record
}
}
...and so on
}
Sample data file is
START=type1=200=0
END=type1=200=0
START=type1=200=0
Name=mickey
data=0xQv4gBjAAEIQwoAAQIAA
END=type1=200=0
START=type1=200=1
END=type1=200=1
START=type1=200=1
Name=GueesssMan
data=WL8vsAABCEMKAAECAAAAU
Q
END=type1=200=1
START=type2=200=0
Name=maximumsecurity
END=type2=200=0
START=type1=200=2
END=type1=200=2
START=type1=200=2
Name=mickey2
data=0xQv4gBjAAEIQwoAAQIAA
END=type1=200=2
START=type2=200=1
Name=conn2
END=type2=200=1
ASKER
the modified code if it's of any help
import java.io.*;
import com.oroinc.text.regex.*;
public final class streamInputExample
{
public static final void main(String args[]) { String regex = "START=\\w+\\W+"; Perl5Matcher matcher;
Perl5Compiler compiler;
Perl5Pattern pattern = null;
Perl5StreamInput input;
MatchResult result;
Reader file = null;
// Create Perl5Compiler and Perl5Matcher instances.
compiler = new Perl5Compiler();
matcher = new Perl5Matcher();
// Attempt to compile the pattern. If the pattern is not valid,
// report the error and exit.
try { pattern = (Perl5Pattern)compiler.com pile(regex ,Perl5Comp iler.CASE_ INSENSITIV E_MASK);
} catch(MalformedPatternExce ption e) {
System.err.println("Bad pattern.");
System.err.println(e.getMe ssage());
System.exit(1);
}
// Open input file.
try {
file = new BufferedReader(new FileReader("TestFile.txt") );
} catch(IOException e) {
System.err.println("Error opening streamInputExample.txt.");
System.err.println(e.getMe ssage());
System.exit(1);
}
// Create a Perl5StreamInput instance to search the input stream.
input = new Perl5StreamInput(file);
// We need to put the search loop in a try block because when searching
// a Perl5StreamInput instance, an IOException may occur, and it must be
// caught.
try { // Loop until there are no more matches left.
while(matcher.contains(inp ut, pattern)) {
// Since we're still in the loop, fetch match that was found.
result = matcher.getMatch();
System.out.println("result .toString( )="+result .toString( )); System.out.println("Begin offset: " + result.beginOffset(0));
System.out.println("End offset: " + result.endOffset(0));
}
} catch(IOException e) { System.err.println("Error occurred while reading file.");
System.err.println(e.getMe ssage());
System.exit(1);
}
}
}
import java.io.*;
import com.oroinc.text.regex.*;
public final class streamInputExample
{
public static final void main(String args[]) { String regex = "START=\\w+\\W+"; Perl5Matcher matcher;
Perl5Compiler compiler;
Perl5Pattern pattern = null;
Perl5StreamInput input;
MatchResult result;
Reader file = null;
// Create Perl5Compiler and Perl5Matcher instances.
compiler = new Perl5Compiler();
matcher = new Perl5Matcher();
// Attempt to compile the pattern. If the pattern is not valid,
// report the error and exit.
try { pattern = (Perl5Pattern)compiler.com
} catch(MalformedPatternExce
System.err.println("Bad pattern.");
System.err.println(e.getMe
System.exit(1);
}
// Open input file.
try {
file = new BufferedReader(new FileReader("TestFile.txt")
} catch(IOException e) {
System.err.println("Error opening streamInputExample.txt.");
System.err.println(e.getMe
System.exit(1);
}
// Create a Perl5StreamInput instance to search the input stream.
input = new Perl5StreamInput(file);
// We need to put the search loop in a try block because when searching
// a Perl5StreamInput instance, an IOException may occur, and it must be
// caught.
try { // Loop until there are no more matches left.
while(matcher.contains(inp
// Since we're still in the loop, fetch match that was found.
result = matcher.getMatch();
System.out.println("result
System.out.println("End offset: " + result.endOffset(0));
}
} catch(IOException e) { System.err.println("Error occurred while reading file.");
System.err.println(e.getMe
System.exit(1);
}
}
}
hello mbormann,
I'm not used to Perl5 syntax but this pattern seems to work :
String regex ="START=" + args[0] + "=" + args[1] + "(\\n|.)*END=" + args[0] + "=" + args[1];
with this TestFile :
START=TYPE1=BLAH
type one
hello world
123456
END=TYPE1=BLAH
START=TYPE2=BLAH
type two
9876545645343243
$$$$$$$$$$$$$thvr$hvr$$vth rzthvrhrs$ vrthhrz$jh
END=TYPE2=BLAH
START=TYPE3=BLAH type 3 END=TYPE3=BLAH
I run : java streamInputExample TYPE2 BLAH
I'm not used to Perl5 syntax but this pattern seems to work :
String regex ="START=" + args[0] + "=" + args[1] + "(\\n|.)*END=" + args[0] + "=" + args[1];
with this TestFile :
START=TYPE1=BLAH
type one
hello world
123456
END=TYPE1=BLAH
START=TYPE2=BLAH
type two
9876545645343243
$$$$$$$$$$$$$thvr$hvr$$vth
END=TYPE2=BLAH
START=TYPE3=BLAH type 3 END=TYPE3=BLAH
I run : java streamInputExample TYPE2 BLAH
ASKER
sorry pyxide I was not clear enough in my explanation ,perhaps u would check out my question in Perl section?
https://www.experts-exchange.com/jsp/qShow.jsp?ta=perl&qid=10244086
and the BLAH is not constant it is a fixed number followed by a variable number.
like
START=TYPE1=200=2
another
START=TYPE1=200=6
etc...
thanks
P.s I too am not used to Perl5 Syntax
I have to make this stuff entirely dynamic and chain LIGHTNING,tell me am I on right track on this RegEx stuff?
https://www.experts-exchange.com/jsp/qShow.jsp?ta=perl&qid=10244086
and the BLAH is not constant it is a fixed number followed by a variable number.
like
START=TYPE1=200=2
another
START=TYPE1=200=6
etc...
thanks
P.s I too am not used to Perl5 Syntax
I have to make this stuff entirely dynamic and chain LIGHTNING,tell me am I on right track on this RegEx stuff?
mb, is there any reason you can't read the file something like this? You mentioned you can't use readline, but I'm not sure why?
import java.io.*;
public class te {
public static void main(String args[])
throws FileNotFoundException, IOException {
if (args.length == 1) {
BufferedReader in = new BufferedReader(new FileReader(args[0]));
String s = "";
String currentType = "";
StringBuffer sb = new StringBuffer();
while ( (s = in.readLine()) != null ) {
if ( s.startsWith("START=") ) {
boolean reachedEnd = false;
// read data
while (!reachedEnd && (s != null) ) {
s = in.readLine();
if ( (s != null) && (!s.startsWith("END=")) ) {
sb.append("\n" + s);
} else if (s == null) {
reachedEnd = true;
}
}
sb.append("\n\n");
}
}
System.out.println("\n\n" + sb.toString());
} else {
System.out.println("You must provide a file to search");
}
}
}
I used this test file....
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
import java.io.*;
public class te {
public static void main(String args[])
throws FileNotFoundException, IOException {
if (args.length == 1) {
BufferedReader in = new BufferedReader(new FileReader(args[0]));
String s = "";
String currentType = "";
StringBuffer sb = new StringBuffer();
while ( (s = in.readLine()) != null ) {
if ( s.startsWith("START=") ) {
boolean reachedEnd = false;
// read data
while (!reachedEnd && (s != null) ) {
s = in.readLine();
if ( (s != null) && (!s.startsWith("END=")) ) {
sb.append("\n" + s);
} else if (s == null) {
reachedEnd = true;
}
}
sb.append("\n\n");
}
}
System.out.println("\n\n" + sb.toString());
} else {
System.out.println("You must provide a file to search");
}
}
}
I used this test file....
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
START=TYPE1=BLAH
lhglsdfjg
sl;kfgjsfd
;
;lskfhg
END=TYPE1=BLAH
ASKER
Jod,
this has to be as lightning fast as possible in Java with very less garbage objects created ,u no that for that each readLine() a new strigBuffer is done.
Also i tried indexOf() and BoyerMoore's indexOf() ,but that hasn't got a facility of indexOf(string,int beginIndex) which would have been super for me.
My senior colleague said try to use the above stuff like i am doing,I saw that JLex is sort of complicated right now,OROinc's stuff is easier to implement in a shorter time which is a BIG constraint right now for us.
hope i get a solution soon.
this has to be as lightning fast as possible in Java with very less garbage objects created ,u no that for that each readLine() a new strigBuffer is done.
Also i tried indexOf() and BoyerMoore's indexOf() ,but that hasn't got a facility of indexOf(string,int beginIndex) which would have been super for me.
My senior colleague said try to use the above stuff like i am doing,I saw that JLex is sort of complicated right now,OROinc's stuff is easier to implement in a shorter time which is a BIG constraint right now for us.
hope i get a solution soon.
I use OROMatcher in my grep tool and it works fine :)
What exactly u want to know?
What exactly u want to know?
ASKER
vladi,
pls lookat my question in Perl section
https://www.experts-exchange.com/jsp/qShow.jsp?ta=perl&qid=10244086
I have a problem with specifing the Regex string if u look at the code above which i have given with
String regex = "START=\\w+\\W+";
just tell me the regex string.if u have any clarfications pls ask,else am mailing u ,is it OK?
pls lookat my question in Perl section
https://www.experts-exchange.com/jsp/qShow.jsp?ta=perl&qid=10244086
I have a problem with specifing the Regex string if u look at the code above which i have given with
String regex = "START=\\w+\\W+";
just tell me the regex string.if u have any clarfications pls ask,else am mailing u ,is it OK?
>> u no that for that each readLine() a new strigBuffer is done
Yes, I think we've been here before, with previous questions.
There is also JCC which is a Java program which can produce a praser based on a specification you give it. There are some details here:
http://www.suntest.com/JavaCC/
though how optimised the code it produces is, I am not to sure.
You do also have:
regionMatches
public boolean regionMatches(int toffset,
String other,
int ooffset,
int len)
Tests if two string regions are equal.
A substring of this String object is compared to a substring of the argument other. The result is true if these substrings represent identical character sequences. The substring of this String object to be compared begins at index toffset and has length len. The substring of other to be compared begins at index ooffset and has length len. The result is false if and only if at least one of the following is true:
toffset is negative.
ooffset is negative.
toffset+len is greater than the length of this String object.
ooffset+len is greater than the length of the other argument.
There is some nonnegative integer k less than len such that: this.charAt(toffset+k) != other.charAt(ooffset+k)
Parameters:
toffset - the starting offset of the subregion in this string.
other - the string argument.
ooffset - the starting offset of the subregion in the string argument.
len - the number of characters to compare.
Returns:
true if the specified subregion of this string exactly matches the specified subregion of the string argument; false otherwise.
Throws:
NullPointerException - if other is null.
for example...
Yes, I think we've been here before, with previous questions.
There is also JCC which is a Java program which can produce a praser based on a specification you give it. There are some details here:
http://www.suntest.com/JavaCC/
though how optimised the code it produces is, I am not to sure.
You do also have:
regionMatches
public boolean regionMatches(int toffset,
String other,
int ooffset,
int len)
Tests if two string regions are equal.
A substring of this String object is compared to a substring of the argument other. The result is true if these substrings represent identical character sequences. The substring of this String object to be compared begins at index toffset and has length len. The substring of other to be compared begins at index ooffset and has length len. The result is false if and only if at least one of the following is true:
toffset is negative.
ooffset is negative.
toffset+len is greater than the length of this String object.
ooffset+len is greater than the length of the other argument.
There is some nonnegative integer k less than len such that: this.charAt(toffset+k) != other.charAt(ooffset+k)
Parameters:
toffset - the starting offset of the subregion in this string.
other - the string argument.
ooffset - the starting offset of the subregion in the string argument.
len - the number of characters to compare.
Returns:
true if the specified subregion of this string exactly matches the specified subregion of the string argument; false otherwise.
Throws:
NullPointerException - if other is null.
for example...
ASKER
Edited text of question.
ASKER
Jod,
We had a long and heated discussion for abt 2 hrs and finally he concluded we need to do this kinda stuff after examinign ALL options.If I am not able to implement it quickly then its back to good old String and rest of the optimistaions involved but as u said I am also convinced that RegEx is the one to achieve lightning speed.
thanks
We had a long and heated discussion for abt 2 hrs and finally he concluded we need to do this kinda stuff after examinign ALL options.If I am not able to implement it quickly then its back to good old String and rest of the optimistaions involved but as u said I am also convinced that RegEx is the one to achieve lightning speed.
thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I haven't used it myself, but it looks promising - it seems to be particularly aimed at parsing info when you don't need or want to read the whole file in.
Don't use perl either so can't tell you much about the specifics of perl string matching either here...
Don't use perl either so can't tell you much about the specifics of perl string matching either here...
ASKER
pyxide ,
heartfelt thanks man for trying but I will definitely see along the pattern u pointed out and will then post final working code over here tommorrow.
amigo muchas gracias
Jod ,i too am learning.But believe me from what I have seen this is chain n greased lightning rolled into one.
heartfelt thanks man for trying but I will definitely see along the pattern u pointed out and will then post final working code over here tommorrow.
amigo muchas gracias
Jod ,i too am learning.But believe me from what I have seen this is chain n greased lightning rolled into one.
ASKER
I am trying with this
String st_regex_start_block ="START=(.+?)=([0-9]+)=([0 -9]+)";
String st_regex_end_block ="((\\n.*?)*?)(\\nEND=(.+? )=[0-9]+=[ 0-9]+){1}? ";
the empty body works pefectly man and I think you r a gennius in the way u picked the RegEx String ,I can decide on the type which code to execute and the last variable constant number.
THANKS
it fails with only 1 thing ,is a little strange i think
INPUT
-----
START=TYPEa=200=0
DataLine 1
DataLine 2
DataLine 3
DataLine 4
DataLine 5
JavaWebServer version...izzitOK=Yes
END=TYPEa=200=0
OUTPUT
------
type [TYPEa]
cons [200]
variable [0]
--------------------
body :[
DataLine 1
DataLine 2
DataLine 3
DataLine 4
DataLine 5
]avaWebServer version...izzitOK=Yes
--------------------
,as u can see the ']' which denotes the end of the record gets printed and then the last line minus the first char 'J' gets printed.
I am testing on WinNT4.0 which has line separator as \r\n,i think this is significant.
when i put
System.out.println(" body :[" + result.group(1) + "]***");
then the result is again of the last line
]***WebServer version...izzitOK=Yes
what is happening here?I have to stop for today and maybe come back later eveningish.
Thanks man Thanks
String st_regex_start_block ="START=(.+?)=([0-9]+)=([0
String st_regex_end_block ="((\\n.*?)*?)(\\nEND=(.+?
the empty body works pefectly man and I think you r a gennius in the way u picked the RegEx String ,I can decide on the type which code to execute and the last variable constant number.
THANKS
it fails with only 1 thing ,is a little strange i think
INPUT
-----
START=TYPEa=200=0
DataLine 1
DataLine 2
DataLine 3
DataLine 4
DataLine 5
JavaWebServer version...izzitOK=Yes
END=TYPEa=200=0
OUTPUT
------
type [TYPEa]
cons [200]
variable [0]
--------------------
body :[
DataLine 1
DataLine 2
DataLine 3
DataLine 4
DataLine 5
]avaWebServer version...izzitOK=Yes
--------------------
,as u can see the ']' which denotes the end of the record gets printed and then the last line minus the first char 'J' gets printed.
I am testing on WinNT4.0 which has line separator as \r\n,i think this is significant.
when i put
System.out.println(" body :[" + result.group(1) + "]***");
then the result is again of the last line
]***WebServer version...izzitOK=Yes
what is happening here?I have to stop for today and maybe come back later eveningish.
Thanks man Thanks
ASKER
Adjusted points to 300
I noticed the little strange thing too:
When i look at output in DOS window (NT 4)
this happens :
]avaWebServer version...izzitOK=Yes
But when I send output in a file, it's OK
An hexadecimal editor shows that the group begins with a line feed character and ends with a return character, so I think DOS console puts the character(s) following 0x0D at the beginning of the current line, erasing initials character in the buffer, before printing it.
Anyway, this should'nt bother in a Java string.
type [TYPEa]
cons [200]
variable [0]
--------------------
body :[ (0x0A) DataLine 1
....
....
JavaWebServer version...izzitOK=Yes (0x0D)]
--------------------
One thing : regexps work, but can be enhanced.
for example, i assume you don't need the "(\\nEND=(.+?)=[0-9]+=[0-9 ]+){1}?" group.
In Oro documentation, I saw an extended expression like this
(?:regexp) Groups things like "()" but doesn't cause the group match to be saved.
But I don't have time at the moment to play sorcerer's apprentice.
Good luck
When i look at output in DOS window (NT 4)
this happens :
]avaWebServer version...izzitOK=Yes
But when I send output in a file, it's OK
An hexadecimal editor shows that the group begins with a line feed character and ends with a return character, so I think DOS console puts the character(s) following 0x0D at the beginning of the current line, erasing initials character in the buffer, before printing it.
Anyway, this should'nt bother in a Java string.
type [TYPEa]
cons [200]
variable [0]
--------------------
body :[ (0x0A) DataLine 1
....
....
JavaWebServer version...izzitOK=Yes (0x0D)]
--------------------
One thing : regexps work, but can be enhanced.
for example, i assume you don't need the "(\\nEND=(.+?)=[0-9]+=[0-9
In Oro documentation, I saw an extended expression like this
(?:regexp) Groups things like "()" but doesn't cause the group match to be saved.
But I don't have time at the moment to play sorcerer's apprentice.
Good luck
ASKER
I see that u can now collect a free T-Shirt since u have crossed 5000 .
Again muchas gracias amigo.
Again muchas gracias amigo.
Free T Shirt? Wot free T Shirt....
How are you finding this package mb? Is it any good?
How are you finding this package mb? Is it any good?
ASKER
Jod,
U r entitled to a free T-Shirt after u reach 5000,15000,30000,50000,750 00,100000
points .
This goes on & on after 100000 u dont get anything
Look at my question on T-Shirt here.
https://www.experts-exchange.com/jsp/qShow.jsp?ta=commspt&qid=10233717
also look at
https://www.experts-exchange.com/info/tshirt.htm
And yes I have been doing some other stuff but from what I heard from the others its faster ,as these com.oroinc Guys are not loading the whole file into memroy they r using a char[] buffer,and pyxide already parsed the way we want it.
Now we have to convince our clients to use this as they generally have a problem accepting third party stuff.
Again (sigh!) didja read the question on Multiple inheritance?
I brought Scott Meyers 2 books 'Effective C++' & 'More 'Effective C++',am trying to get hold of Arthur J Riel's 'Obect Oriented Design Heuristics'
If u hvae time look at these questions i asked in the past week or so.
https://www.experts-exchange.com/jsp/qShow.jsp?ta=cplusprog&qid=10243109
https://www.experts-exchange.com/jsp/qShow.jsp?ta=cplusprog&qid=10245184
https://www.experts-exchange.com/jsp/qShow.jsp?ta=cplusprog&qid=10243045
Also if u have anything to say pls say so over there!
Thanks n cheers
U r entitled to a free T-Shirt after u reach 5000,15000,30000,50000,750
points .
This goes on & on after 100000 u dont get anything
Look at my question on T-Shirt here.
https://www.experts-exchange.com/jsp/qShow.jsp?ta=commspt&qid=10233717
also look at
https://www.experts-exchange.com/info/tshirt.htm
And yes I have been doing some other stuff but from what I heard from the others its faster ,as these com.oroinc Guys are not loading the whole file into memroy they r using a char[] buffer,and pyxide already parsed the way we want it.
Now we have to convince our clients to use this as they generally have a problem accepting third party stuff.
Again (sigh!) didja read the question on Multiple inheritance?
I brought Scott Meyers 2 books 'Effective C++' & 'More 'Effective C++',am trying to get hold of Arthur J Riel's 'Obect Oriented Design Heuristics'
If u hvae time look at these questions i asked in the past week or so.
https://www.experts-exchange.com/jsp/qShow.jsp?ta=cplusprog&qid=10243109
https://www.experts-exchange.com/jsp/qShow.jsp?ta=cplusprog&qid=10245184
https://www.experts-exchange.com/jsp/qShow.jsp?ta=cplusprog&qid=10243045
Also if u have anything to say pls say so over there!
Thanks n cheers
mb,
btw have you seen this:
http://www.cacas.org/java/gnu/regexp/
Just as an alternative to try perhaps.
btw have you seen this:
http://www.cacas.org/java/gnu/regexp/
Just as an alternative to try perhaps.
ASKER
Jod,
I surely dont like to try out the GNU Java code as it has been a bad experience for me.My colleagues who do C++ say it's good.
It used to break after 65536 bytes and used to gimme a Error packet everytime.
That's the reson I didn't even look at their implementation,yes if u want to tinker around it's fine.
Compare the TFTP implementation of theirs and NetComponents from OROInc ,ORO code is blazingly fast and stable ,though IBM has removed the 32MB limit from their implementation,I couldnt use it as its meant for a totally commandline based utility and what I wanted was satisfied mainly by ORO.
Jod ,this is the second time inside a couple of months that I have used ORO code and its been a good experience for me and I trust them and have started recommending that to all my friends who r intesredt.
I surely dont like to try out the GNU Java code as it has been a bad experience for me.My colleagues who do C++ say it's good.
It used to break after 65536 bytes and used to gimme a Error packet everytime.
That's the reson I didn't even look at their implementation,yes if u want to tinker around it's fine.
Compare the TFTP implementation of theirs and NetComponents from OROInc ,ORO code is blazingly fast and stable ,though IBM has removed the 32MB limit from their implementation,I couldnt use it as its meant for a totally commandline based utility and what I wanted was satisfied mainly by ORO.
Jod ,this is the second time inside a couple of months that I have used ORO code and its been a good experience for me and I trust them and have started recommending that to all my friends who r intesredt.
ASKER
which is right now
String regex = "START=\\w+\\W+";
I want to pick up whole string from
START=........
....
....
....
END=........
what should that string be?
Thanks