Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 419
  • Last Modified:

Perl regex line matching question..

Hello,

I am trying to eliminate lines in an array (was a file) which delete lines certain things based on matches over several lines or array elements..

this is an example of what I'm wanting to match from (with comments next the lines to delete which aren't in the real text):

(it's G-Code for a CNC machine)

N108 G01 X40    Z-0.013
N109 G01 X40.4 Z-3.2
N110 G01 X40.8 Z-3.2   # Delete
N111 G01 X41.2 Z-3.2
N112 G01 X41.6 Z-0.013
N113 G01 X42    Z-3.2
N114 G01 X42.4 Z-3.2
N115 G01 X42.8 Z-0.013
N116 G01 X43.2 Z-3.2
N117 G01 X43.6 Z-0.013
N118 G01 X44    Z-3.2
N119 G01 X44.4 Z-0.013
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013

What I need to match is that if the last couple characters of 3 lines are the same then delete the middle line - so that big chunks of lines will be eliminated leaving the first and last line still there if all the rest of the last 2 characters are the same,

 - Then there are lines with the letter Y in it which I want to keep and not delete at all even if they fall within other lines that would be deleted

There are around 70000+ lines of G-Code so it sort of needs to be efficient..

I have posted part of my Perl logic here which I'm using..

The code below will eliminate lines which the previous one is the same but this didn't have the desired effect on the CNC machine so I need to leave a couple lines in the G-Code..



my $new_gcode=process_gcode($gcode);

sub process_gcode {
    my ($gcode) =@_;
    my @shorter;
    my $last;
    my $current;
    my @gcode= split /\n/,$gcode;
    $gcode="";
    
    foreach my $line (@gcode) {
        #push into new array if end of lines aren't the same
        $current=$line;
        $current =~ /(..)\r$/;
        $current=$1;
        
        if ($current != $last or $line =~ /Y/) {
                push @shorter, $line;
                $gcode.=$line;
        }
        $last = $current;
    }
    #return "@shorter";
    return $gcode;
}

Open in new window

0
timbo007
Asked:
timbo007
  • 3
  • 2
  • 2
  • +1
2 Solutions
 
Terry WoodsIT GuruCommented:
Try using a replace pattern:
^([^\r\n]*\s)(\S+)(\r\n)[^\r\n]*\s\2\r\n([^\r\n]*\s\2\r\n)
With:
$1$2$3$4
in multiline mode.

I guess it would be this in perl:
s/^([^\r\n]*\s)(\S+)(\r\n)[^\r\n]*\s\2\r\n([^\r\n]*\s\2\r\n)/$1$2$3$4/m
0
 
Terry WoodsIT GuruCommented:
You may need to re-run it until you get no further replacements, so depending on what you want to use it for the performance may or may not be a problem.
0
 
ozoCommented:
$_='N108 G01 X40    Z-0.013                                                    
N109 G01 X40.4 Z-3.2                                                            
N110 G01 X40.8 Z-3.2   # Delete                                                
N111 G01 X41.2 Z-3.2                                                            
N112 G01 X41.6 Z-0.013                                                          
N113 G01 X42    Z-3.2                                                          
N114 G01 X42.4 Z-3.2                                                            
N115 G01 X42.8 Z-0.013                                                          
N116 G01 X43.2 Z-3.2                                                            
N117 G01 X43.6 Z-0.013                                                          
N118 G01 X44    Z-3.2                                                          
N119 G01 X44.4 Z-0.013                                                          
N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013                                                          
';
s/\s*#\s*Delete\s*\n/\n/g;
s/(.*(..\n))(.*\2)+(.*\2)/$1$4/g;
print;
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
timbo007Author Commented:
Hi, thanks for your responses,

I have tried both options and they work but neither option deals with if there was a 'Y' on the lines, this is the tricky part I guess :)

so for example if there was the letter 'Y' on a line then I would expect that line to be kept (but lines either side would be deleted, I am sorry if I didn't make this very clear in the original question..

e.g:

N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013      
N120 G01 X44.8 Z-3.2                                                            
N121 G01 X45.2 Z-0.013                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'                                            
N124 G01 X46.4 Z-0.013 # Delete                                                
N122 G01 X45.6 Z-0.013 # Delete                                                
N123 G01 X46    Z-0.013 # Delete                                                
N124 G01 X46.4 Z-0.013 # Delete                                                
N125 G01 X46.8 Z-0.013      

0
 
ozoCommented:
$_="                                                                                                            
N120 G01 X44.8 Z-3.2                                                                                            
N121 G01 X45.2 Z-0.013                                                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 X46    Z-0.013 # Delete                                                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N125 G01 X46.8 Z-0.013                                                                                          
N120 G01 X44.8 Z-3.2                                                                                            
N121 G01 X45.2 Z-0.013                                                                                          
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 X46    Z-0.013 # Delete                                                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N122 G01 X45.6 Z-0.013 # Delete                                                                                
N123 G01 X46    Z-0.013 # Delete                                                                                
N124 G01 X46.4 Z-0.013 # Delete                                                                                
N125 G01 X46.8 Z-0.013                                                                                          
";
s/\s*?(#.*)?\n/\n/g;
s/(.*(..\n))([^Y\n]*\2|(.*\2))+(.*\2)/$1$4$5/g;
print;
0
 
boockoCommented:
If the file is too big to load in memory, you can try line-by-line processing:

#!perl
use 5.010;
use strict;

my $ln;
my $ll;
my $pp;
foreach (<DATA>) {
      s/\s*#.*//;  # just for removing comments
      my ($c,$num) = /([XYxy]).*?([\d+-.]+)$/;
      if (uc($c) eq 'Y') {$ln=$num; $pp=1; print; next;}
      if ($num eq $ln) {$ll = $_; $pp=0; next;}
      else {$ln=$num; unless ($pp) {print $ll}; print $_; $pp=1;}
}
print $ll;

__DATA__
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013
N120 G01 X44.8 Z-3.2
N121 G01 X45.2 Z-0.013
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 Y46    Z-0.013 # Don't delete this line as it has a 'Y'
N124 G01 X46.4 Z-0.013 # Delete
N122 G01 X45.6 Z-0.013 # Delete
N123 G01 X46    Z-0.013 # Delete
N124 G01 X46.4 Z-0.013 # Delete
N125 G01 X46.8 Z-0.013
0
 
timbo007Author Commented:
Thanks heaps, both of those get the exact results when going over 75000 lines of code so thats promising :) but Boocko's answer is much faster and that is a better option for me.

However I am having trouble understanding Boockos logic in-case I want to mess with it later, are you able to elaborate on what it does?

both answers are superb :)
0
 
boockoCommented:
Sorry for dirty & cryptic hack :-)
Think of it as $ln is "last number", $ll "last line" and $pp "is previous printed?".
So, first find number in line with ([\d+-.]+)$ as well as $c character X or Y.
If it's Y, print it, remember it's printed and save last number, then go to next iteration.
If the number is the same as the "last number" from previous line, remember the line, mark it's not printed yet and go on.
If the number is different than in previous line, print previous line if it's not printed yet, print current line, mark it as printed, and save the number.
Last line is always printed.
Hope it helps, thanks for the points.
0
 
timbo007Author Commented:
That's great thanks
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 3
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now