asked on

Simple perl regular expression help: Searching between a new line?

I am designing an HTML optimizer program, and I want to remove all extra unneeded and 's.

When $html =
foo bar
testcool
foo2
bar2

And my code snippet is:
while ($html =~ s!(|\s+)! !i)
{
print $html;
}

My output looks like:
foo bar
test cool

... but why was "foo2
bar2" ignored? Doesn't the /s+ mean that it searches for all blank spaces, new lines, etc?

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sprockston

ASKER

Well, my complete code was:

open (FILE, "testfile.html") || die "Could not open the file <$!>";
while (my $html = <FILE>)
{
# Remove all blank (empty) lines from the html file.
$html =~ s/(^|\n)[\n\s]*/$1/g;

 while ($html =~ s!(|\s+)! !i)
 {
 print $html;
 }
}
close (FILE);

sprockston

ASKER

...is the way I am reading in the file to a variable the reason why it doesn't work?

ozo

Assuming you left $/ with it's default value of 1
$html will not contain
foo bar
testcool
foo2
bar2
when you read the first line of <FILE>, $html will contain
foo bar
then after
$html =~ s!(|\s+)! !i
it will become
foo bar
when you read the second line of <FILE>
$html will contain
testcool
then after
$html =~ s!(|\s+)! !i
it will become
test cool
when you read the third line of <FILE>
$html will contain
foo2
and
$html =~ s!(|\s+)! !i
will fail
and on the last line of
<FILE>
$html will contain
bar2
and again
$html =~ s!(|\s+)! !i
will fail