Solved

Perl regex line matching question..

Posted on 2010-11-14
9
394 Views
Last Modified: 2012-05-10
Hello,

I am trying to eliminate lines in an array (was a file) which delete lines certain things based on matches over several lines or array elements..

this is an example of what I'm wanting to match from (with comments next the lines to delete which aren't in the real text):

(it's G-Code for a CNC machine)

N108 G01 X40    Z-0.013
N109 G01 X40.4 Z-3.2
N110 G01 X40.8 Z-3.2   # Delete
N111 G01 X41.2 Z-3.2
N112 G01 X41.6 Z-0.013
N113 G01 X42    Z-3.2
N114 G01 X42.4 Z-3.2
N115 G01 X42.8 Z-0.013
N116 G01 X43.2 Z-3.2
N117 G01 X43.6 Z-0.013
N118 G01 X44    Z-3.2
N119 G01 X44.4 Z-0.013
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013

What I need to match is that if the last couple characters of 3 lines are the same then delete the middle line - so that big chunks of lines will be eliminated leaving the first and last line still there if all the rest of the last 2 characters are the same,

 - Then there are lines with the letter Y in it which I want to keep and not delete at all even if they fall within other lines that would be deleted

There are around 70000+ lines of G-Code so it sort of needs to be efficient..

I have posted part of my Perl logic here which I'm using..

The code below will eliminate lines which the previous one is the same but this didn't have the desired effect on the CNC machine so I need to leave a couple lines in the G-Code..



my $new_gcode=process_gcode($gcode);

sub process_gcode {
    my ($gcode) =@_;
    my @shorter;
    my $last;
    my $current;
    my @gcode= split /\n/,$gcode;
    $gcode="";
    
    foreach my $line (@gcode) {
        #push into new array if end of lines aren't the same
        $current=$line;
        $current =~ /(..)\r$/;
        $current=$1;
        
        if ($current != $last or $line =~ /Y/) {
                push @shorter, $line;
                $gcode.=$line;
        }
        $last = $current;
    }
    #return "@shorter";
    return $gcode;
}

Open in new window

0
Comment
Question by:timbo007
  • 3
  • 2
  • 2
  • +1
9 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 34133264
Try using a replace pattern:
^([^\r\n]*\s)(\S+)(\r\n)[^\r\n]*\s\2\r\n([^\r\n]*\s\2\r\n)
With:
$1$2$3$4
in multiline mode.

I guess it would be this in perl:
s/^([^\r\n]*\s)(\S+)(\r\n)[^\r\n]*\s\2\r\n([^\r\n]*\s\2\r\n)/$1$2$3$4/m
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 34133269
You may need to re-run it until you get no further replacements, so depending on what you want to use it for the performance may or may not be a problem.
0
 
LVL 84

Expert Comment

by:ozo
ID: 34134155
$_='N108 G01 X40    Z-0.013                                                    
N109 G01 X40.4 Z-3.2                                                            
N110 G01 X40.8 Z-3.2   # Delete                                                
N111 G01 X41.2 Z-3.2                                                            
N112 G01 X41.6 Z-0.013                                                          
N113 G01 X42    Z-3.2                                                          
N114 G01 X42.4 Z-3.2                                                            
N115 G01 X42.8 Z-0.013                                                          
N116 G01 X43.2 Z-3.2                                                            
N117 G01 X43.6 Z-0.013                                                          
N118 G01 X44    Z-3.2                                                          
N119 G01 X44.4 Z-0.013                                                          
N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013                                                          
';
s/\s*#\s*Delete\s*\n/\n/g;
s/(.*(..\n))(.*\2)+(.*\2)/$1$4/g;
print;
0
 

Author Comment

by:timbo007
ID: 34134400
Hi, thanks for your responses,

I have tried both options and they work but neither option deals with if there was a 'Y' on the lines, this is the tricky part I guess :)

so for example if there was the letter 'Y' on a line then I would expect that line to be kept (but lines either side would be deleted, I am sorry if I didn't make this very clear in the original question..

e.g:

N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013      
N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'                                            
N124 G01 X46.4 Z-0.013 # Delete                                                
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013      

0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 84

Assisted Solution

by:ozo
ozo earned 250 total points
ID: 34134838
$_="                                                                                                            
N120 G01 X44.8 Z-3.2                                                                                            
N121 G01 X45.2 Z-0.013                                                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 X46    Z-0.013 # Delete                                                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N125 G01 X46.8 Z-0.013                                                                                          
N120 G01 X44.8 Z-3.2                                                                                            
N121 G01 X45.2 Z-0.013                                                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 X46    Z-0.013 # Delete                                                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 X46    Z-0.013 # Delete                                                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N125 G01 X46.8 Z-0.013                                                                                          
";
s/\s*?(#.*)?\n/\n/g;
s/(.*(..\n))([^Y\n]*\2|(.*\2))+(.*\2)/$1$4$5/g;
print;
0
 
LVL 4

Accepted Solution

by:
boocko earned 250 total points
ID: 34134915
If the file is too big to load in memory, you can try line-by-line processing:

#!perl
use 5.010;
use strict;

my $ln;
my $ll;
my $pp;
foreach (<DATA>) {
      s/\s*#.*//;  # just for removing comments
      my ($c,$num) = /([XYxy]).*?([\d+-.]+)$/;
      if (uc($c) eq 'Y') {$ln=$num; $pp=1; print; next;}
      if ($num eq $ln) {$ll = $_; $pp=0; next;}
      else {$ln=$num; unless ($pp) {print $ll}; print $_; $pp=1;}
}
print $ll;

__DATA__
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'
N124 G01 X46.4 Z-0.013 # Delete
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013
0
 

Author Closing Comment

by:timbo007
ID: 34135136
Thanks heaps, both of those get the exact results when going over 75000 lines of code so thats promising :) but Boocko's answer is much faster and that is a better option for me.

However I am having trouble understanding Boockos logic in-case I want to mess with it later, are you able to elaborate on what it does?

both answers are superb :)
0
 
LVL 4

Expert Comment

by:boocko
ID: 34135174
Sorry for dirty & cryptic hack :-)
Think of it as $ln is "last number", $ll "last line" and $pp "is previous printed?".
So, first find number in line with ([\d+-.]+)$ as well as $c character X or Y.
If it's Y, print it, remember it's printed and save last number, then go to next iteration.
If the number is the same as the "last number" from previous line, remember the line, mark it's not printed yet and go on.
If the number is different than in previous line, print previous line if it's not printed yet, print current line, mark it as printed, and save the number.
Last line is always printed.
Hope it helps, thanks for the points.
0
 

Author Comment

by:timbo007
ID: 34135214
That's great thanks
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now