Link to home
Start Free TrialLog in
Avatar of ventumsolve
ventumsolveFlag for Denmark

asked on

detecting lines with "wrong" linebreak

I have a very long xml-file and have now spotted some not wellformed lines. There seem to be linebreaks in some of the tag-content so I have something like:

<Value> bla bla </Value>
<Value> bad
line </Value
<Value> bla bla <Value>

Im looking for a regexp to detect the bad line and chomp it so I will have

<Value> bla bla </Value>
<Value> bad line </Value
<Value> bla bla <Value>

Tried the code below but it didn't work. Chomped all lines.
Beware that some of the good lines might also have a whitespace-char after the last ">"

foreach $line (@lines) {
   print $line;
   if (!($line =~ m/>$/)) {
      print LOG "HIT LINE $line";
      chomp($line);
      print OUT $line;
   } else {
      print OUT $line;
   }  
}


ASKER CERTIFIED SOLUTION
Avatar of mjcoyne
mjcoyne

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

$data = join("", @lines);
$data =~ s/\n|\cM//g;           ## remove all line breaks
$data =~ s/\<\/.+?\>/$&\n/g;    ## re-add them
print $data;

Open in new window

Avatar of ventumsolve

ASKER

thanks,
would have liked to see what was wrong with my anchor m/>$/