Solved

detecting lines with "wrong" linebreak

Posted on 2008-10-15
3
182 Views
Last Modified: 2010-03-05
I have a very long xml-file and have now spotted some not wellformed lines. There seem to be linebreaks in some of the tag-content so I have something like:

<Value> bla bla </Value>
<Value> bad
line </Value
<Value> bla bla <Value>

Im looking for a regexp to detect the bad line and chomp it so I will have

<Value> bla bla </Value>
<Value> bad line </Value
<Value> bla bla <Value>

Tried the code below but it didn't work. Chomped all lines.
Beware that some of the good lines might also have a whitespace-char after the last ">"

foreach $line (@lines) {
   print $line;
   if (!($line =~ m/>$/)) {
      print LOG "HIT LINE $line";
      chomp($line);
      print OUT $line;
   } else {
      print OUT $line;
   }  
}


0
Comment
Question by:ventumsolve
3 Comments
 
LVL 17

Accepted Solution

by:
mjcoyne earned 500 total points
ID: 22719221
#!/usr/bin/perl -w
use strict;

my @lines = <DATA>;

for (my $i = 0; $i < $#lines; $i++) {
    next if $i == 0;
    if ($lines[$i] !~ /^</) {
        chomp ($lines[$i-1]);
    }
}

print @lines;

__DATA__
<Value> bla bla </Value>
<Value> bad
line </Value
<Value> bla bla <Value>
0
 
LVL 6

Expert Comment

by:RSLE
ID: 22721409

$data = join("", @lines);
$data =~ s/\n|\cM//g;           ## remove all line breaks
$data =~ s/\<\/.+?\>/$&\n/g;    ## re-add them
print $data;

Open in new window

0
 

Author Comment

by:ventumsolve
ID: 22747994
thanks,
would have liked to see what was wrong with my anchor m/>$/

0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question