Link to home
Start Free TrialLog in
Avatar of timbo007
timbo007Flag for New Zealand

asked on

Perl regex line matching question..

Hello,

I am trying to eliminate lines in an array (was a file) which delete lines certain things based on matches over several lines or array elements..

this is an example of what I'm wanting to match from (with comments next the lines to delete which aren't in the real text):

(it's G-Code for a CNC machine)

N108 G01 X40    Z-0.013
N109 G01 X40.4 Z-3.2
N110 G01 X40.8 Z-3.2   # Delete
N111 G01 X41.2 Z-3.2
N112 G01 X41.6 Z-0.013
N113 G01 X42    Z-3.2
N114 G01 X42.4 Z-3.2
N115 G01 X42.8 Z-0.013
N116 G01 X43.2 Z-3.2
N117 G01 X43.6 Z-0.013
N118 G01 X44    Z-3.2
N119 G01 X44.4 Z-0.013
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013

What I need to match is that if the last couple characters of 3 lines are the same then delete the middle line - so that big chunks of lines will be eliminated leaving the first and last line still there if all the rest of the last 2 characters are the same,

 - Then there are lines with the letter Y in it which I want to keep and not delete at all even if they fall within other lines that would be deleted

There are around 70000+ lines of G-Code so it sort of needs to be efficient..

I have posted part of my Perl logic here which I'm using..

The code below will eliminate lines which the previous one is the same but this didn't have the desired effect on the CNC machine so I need to leave a couple lines in the G-Code..



my $new_gcode=process_gcode($gcode);

sub process_gcode {
    my ($gcode) =@_;
    my @shorter;
    my $last;
    my $current;
    my @gcode= split /\n/,$gcode;
    $gcode="";
    
    foreach my $line (@gcode) {
        #push into new array if end of lines aren't the same
        $current=$line;
        $current =~ /(..)\r$/;
        $current=$1;
        
        if ($current != $last or $line =~ /Y/) {
                push @shorter, $line;
                $gcode.=$line;
        }
        $last = $current;
    }
    #return "@shorter";
    return $gcode;
}

Open in new window

Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Try using a replace pattern:
^([^\r\n]*\s)(\S+)(\r\n)[^\r\n]*\s\2\r\n([^\r\n]*\s\2\r\n)
With:
$1$2$3$4
in multiline mode.

I guess it would be this in perl:
s/^([^\r\n]*\s)(\S+)(\r\n)[^\r\n]*\s\2\r\n([^\r\n]*\s\2\r\n)/$1$2$3$4/m
You may need to re-run it until you get no further replacements, so depending on what you want to use it for the performance may or may not be a problem.
$_='N108 G01 X40    Z-0.013                                                    
N109 G01 X40.4 Z-3.2                                                            
N110 G01 X40.8 Z-3.2   # Delete                                                
N111 G01 X41.2 Z-3.2                                                            
N112 G01 X41.6 Z-0.013                                                          
N113 G01 X42    Z-3.2                                                          
N114 G01 X42.4 Z-3.2                                                            
N115 G01 X42.8 Z-0.013                                                          
N116 G01 X43.2 Z-3.2                                                            
N117 G01 X43.6 Z-0.013                                                          
N118 G01 X44    Z-3.2                                                          
N119 G01 X44.4 Z-0.013                                                          
N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013                                                          
';
s/\s*#\s*Delete\s*\n/\n/g;
s/(.*(..\n))(.*\2)+(.*\2)/$1$4/g;
print;
Avatar of timbo007

ASKER

Hi, thanks for your responses,

I have tried both options and they work but neither option deals with if there was a 'Y' on the lines, this is the tricky part I guess :)

so for example if there was the letter 'Y' on a line then I would expect that line to be kept (but lines either side would be deleted, I am sorry if I didn't make this very clear in the original question..

e.g:

N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013      
N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'                                            
N124 G01 X46.4 Z-0.013 # Delete                                                
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013      

SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks heaps, both of those get the exact results when going over 75000 lines of code so thats promising :) but Boocko's answer is much faster and that is a better option for me.

However I am having trouble understanding Boockos logic in-case I want to mess with it later, are you able to elaborate on what it does?

both answers are superb :)
Sorry for dirty & cryptic hack :-)
Think of it as $ln is "last number", $ll "last line" and $pp "is previous printed?".
So, first find number in line with ([\d+-.]+)$ as well as $c character X or Y.
If it's Y, print it, remember it's printed and save last number, then go to next iteration.
If the number is the same as the "last number" from previous line, remember the line, mark it's not printed yet and go on.
If the number is different than in previous line, print previous line if it's not printed yet, print current line, mark it as printed, and save the number.
Last line is always printed.
Hope it helps, thanks for the points.
That's great thanks