Shiju S
asked on
Avoiding Greedy matches and capturing Multiple lines using Regular Expressions (VB)
Hi
i need to extract some lines of strings between a specified "start" and "end". I made use of Microsoft Vbscript Regular Expressions 5.5, here is VB my code
'------------------------- ---------- ---------- ----------
Dim objReg As New RegExp
Dim objMatchCol As MatchCollection
With objReg
.Global = True
.IgnoreCase = True
.Pattern ="start(.*?)end"
sSourceString="start : Welcome to Experts-Exchnge : end"
Set objMatchCol = .Execute(sSourceString)
End With
'------------------------- ---------- ---------- ---------- -------
It works smoothly but it is unable to catch multiple lines between start and end. See the following text
'------------------------- ---------- -------
start
: Welcome
to
Experts-Exchnge
:
end
'------------------------- ---------- ----
How can i get these lines betwen 'start' and 'end' as a single match?
Please help
Thanks
Shiju
i need to extract some lines of strings between a specified "start" and "end". I made use of Microsoft Vbscript Regular Expressions 5.5, here is VB my code
'-------------------------
Dim objReg As New RegExp
Dim objMatchCol As MatchCollection
With objReg
.Global = True
.IgnoreCase = True
.Pattern ="start(.*?)end"
sSourceString="start : Welcome to Experts-Exchnge : end"
Set objMatchCol = .Execute(sSourceString)
End With
'-------------------------
It works smoothly but it is unable to catch multiple lines between start and end. See the following text
'-------------------------
start
: Welcome
to
Experts-Exchnge
:
end
'-------------------------
How can i get these lines betwen 'start' and 'end' as a single match?
Please help
Thanks
Shiju
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Which OS are u using
Oh its Microsoft.....
No regular expressions match multiple lines. Your problem can be solved by writing a Perl script.
install the free software ActivePerl for Windows from
http://www.activestate.com/Products/ActivePerl/
Copy the folowing to a .pl file say startend.pl
$inSection = 0;
while (<STDIN>)
{
if (m/end/)
{
$inSection = 0;
}
if ($inSection)
{
print;
}
if (m/start/)
{
$inSection = 1;
}
}
Assume the source file is called temp.txt
use the following command
perl startend.pl < temp.txt
You may need to provide the whole path if the perl script file or the temp.txt file are not in the same directory.
The required section will be printed to STDOUT which also be redirected to a file.
No regular expressions match multiple lines. Your problem can be solved by writing a Perl script.
install the free software ActivePerl for Windows from
http://www.activestate.com/Products/ActivePerl/
Copy the folowing to a .pl file say startend.pl
$inSection = 0;
while (<STDIN>)
{
if (m/end/)
{
$inSection = 0;
}
if ($inSection)
{
print;
}
if (m/start/)
{
$inSection = 1;
}
}
Assume the source file is called temp.txt
use the following command
perl startend.pl < temp.txt
You may need to provide the whole path if the perl script file or the temp.txt file are not in the same directory.
The required section will be printed to STDOUT which also be redirected to a file.
ASKER
Hello
Thank u all for posting comments...
hi sgartner
I tried ur code
"^start((.|\n)*)end$"
but this doesnt seem to solve it. This code works only if the entire match is with in a single line.
Sample text i gave in question can be in the middle of a large string. I need to find all such matches from the entire string.
Hi ozo ,
i tried ur code
.Pattern ="start([\s\S]*?)end"
this also has the same effect, only takes matches with in a single line
Well, I am using VB6 , Os is Windows 2000
Hoping more comments....
Shiju
Thank u all for posting comments...
hi sgartner
I tried ur code
"^start((.|\n)*)end$"
but this doesnt seem to solve it. This code works only if the entire match is with in a single line.
Sample text i gave in question can be in the middle of a large string. I need to find all such matches from the entire string.
Hi ozo ,
i tried ur code
.Pattern ="start([\s\S]*?)end"
this also has the same effect, only takes matches with in a single line
Well, I am using VB6 , Os is Windows 2000
Hoping more comments....
Shiju
Shiju,
The ^ and $ force it to be by itself, which is how your samples were. Just remove those two characters and it should work.
Scott
The ^ and $ force it to be by itself, which is how your samples were. Just remove those two characters and it should work.
Scott
ASKER
Hi Ozo
Thank u for ur code, i am accepting ur answer
Code given by u is working good, I am really sorry that i posted a comment indicating that code given by u was not working, it was my mistake i gave an invalid string for verification.
it took only 20 milliseconds to execute my source string with the pattern u gave.
.Pattern ="start([\s\S]*?)end"
Hi Scott
Thank u for ur posting, i accept ur answer as a supporting one
after removing ^ and $ it looked greedy
but putting a ? in ur code made successful matching
.Pattern ="^start((.|\n)*?)end$"
This pattern solved my problem, but it took 90 milliseconds to execute my source string
Regards
Shiju
Thank u for ur code, i am accepting ur answer
Code given by u is working good, I am really sorry that i posted a comment indicating that code given by u was not working, it was my mistake i gave an invalid string for verification.
it took only 20 milliseconds to execute my source string with the pattern u gave.
.Pattern ="start([\s\S]*?)end"
Hi Scott
Thank u for ur posting, i accept ur answer as a supporting one
after removing ^ and $ it looked greedy
but putting a ? in ur code made successful matching
.Pattern ="^start((.|\n)*?)end$"
This pattern solved my problem, but it took 90 milliseconds to execute my source string
Regards
Shiju