Link to home
Start Free TrialLog in
Avatar of hadrons
hadrons

asked on

Need help with a subrountine (involves using the uniq() function and regular expression)

Hi, I have a script that involves using a subroutine to modify and return a value and in its original state it works as expected, but I've been stymied at making two improvements to the returning results:
1) having only value return if there are duplications of that string (so if "Mariá" appears 5x, I only want to see it once and not 5x)

2) if a pattern appears just one, I want it deleted (if it appears more than once in a string, its fine)

So this is the base script:
my @files = glob("*_mismatch_htmlcodes.txt");      
foreach my $file(@files) {            
open FILE, '<:encoding(UTF-8)', ("$file") or warn "Can't open $file: $!";  
open NEW, '>>:encoding(UTF-8)', ("$file._mismatch_htmlcodes_egrep.txt") or warn "Cannot open file for write: $!";  

system ("echo $file");

select (NEW);

while (<FILE>) {
        chomp ($_);
        @new_field = split(/$\//,$_);
        $new_field = $_;
 
if (/EGREP/) { ## I want to print values with just this string in it
       print &CleanHTML6($_);
       print $/;

}
       
    } # while loop

} ## for each

print $NEW;
close(OLD);
close(NEW);


sub CleanHTML6 {
    my ($value) = @_;

    $value =~ s/THIS/THAT/g;


    return $value;
}

It works fine, but I tried to use the uniq() to consolidate the results:

use List::MoreUtils qw(uniq);
my @files = glob("*_mismatch_htmlcodes.txt");      
foreach my $file(@files) {            
open FILE, '<:encoding(UTF-8)', ("$file") or warn "Can't open $file: $!";  
open NEW, '>>:encoding(UTF-8)', ("$file._mismatch_htmlcodes_egrep.txt") or warn "Cannot open file for write: $!";  

system ("echo $file");

select (NEW);

while (<FILE>) {
        chomp ($_);
        @new_field = split(/$\//,$_);
        $new_field = $_;
 
if (/EGREP/) {
       print &CleanHTML6($_);
       print $/;

}
       
    } # while loop

} ## for each

print $NEW;
close(OLD);
close(NEW);


sub CleanHTML6 {
    my ($value) = @_;
    my @unique_value = uniq @value;

    $unique_value =~ s/THIS/THAT/g;


    return $unique_value;
}

And it just returns blank lines. Also, its here:

$unique_value =~ s/THIS//g;
that a regular expression that can delete the value if it finds the pattern THIS only once on the line; thanks








Avatar of ozo
ozo
Flag of United States of America image

with
  use warnings; use diagnostics;
Name "main::value" used only once: possible typo at line 33.
Name "main::OLD" used only once: possible typo at  line 27.
    (W once) Typographical errors often show up as unique variable
    names.  If you had a good reason for having a unique name, then
    just mention it again somehow to suppress the message.  The our
    declaration is also provided for this purpose.
   
    NOTE: This warning detects package symbols that have been used
    only once.  This means lexical variables will never trigger this
    warning.  It also means that all of the package variables $c, @c,
    %c, as well as *c, &c, sub c{}, c(), and c (the filehandle or
    format) are considered the same; if a program uses $c only once
    but also uses any of the others it will not trigger this warning.
    Symbols beginning with an underscore and symbols using special
    identifiers (q.v. perldata) are exempt from this warning.
Global symbol "@value" requires explicit package name (did you forget to declare "my @value"?) at line 33.
Global symbol "$unique_value" requires explicit package name (did you forget to declare "my $unique_value"?) at  line 35.
Global symbol "$unique_value" requires explicit package name (did you forget to declare "my $unique_value"?) at  line 38.
Use of uninitialized value $unique_value in substitution (s///) at line 35, <FILE> line 5
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.
   
    To help you figure out what was undefined, perl will try to tell you
    the name of the variable (if any) that was undefined.  In some cases
    it cannot do this, so it also tells you what operation you used the
    undefined value in.  Note, however, that perl optimizes your program
    and the operation displayed in the warning may not necessarily appear
    literally in your program.  For example, "that $foo" is usually
    optimized into "that " . $foo, and the warning will refer to the
    concatenation (.) operator, even though there is no . in
    your program.
Use of uninitialized value $NEW in print at line 26, <FILE> line 6.

Did you mean to say
   return @unique_value;
?

@unique_value is an array. $unique_value is a scalar that was never initialized
Also, @value is an array that doesn't seem to have been initialized. Did you mean to use the scalar $value?

Avatar of hadrons
hadrons

ASKER

If I read your post correctly, I should be doing something closer to this?

    my @unique_value = uniq @_;
    my $unique_value (@unique_value );
    $unique_value =~ s/THIS/THAT/g;

The above didn't run, but this did:

    my @unique_value = uniq @_;

    $unique_value =~ s/THIS/THAT/g;

But just blank lines were returned


    return $unique_value;

Where did you set $unique_value?
Avatar of hadrons

ASKER

I wasn't able to do this through the subrountine, but I did modifiy the WHILE statement:

my %seen;
while (<FILE>) {
        chomp ($_);
        @new_field = split(/$\//,$_);
        $new_field = $_;
 

       print if $seen{&CleanHTML($_)}++ == 1;
       print $/;

       
    } # while loop

Works great ... it does keep linebreaks in where a dup was found (so the first Mari&#225; is there, but other cases would be a blank line) ... I can probably find a way to eliminate them
On edit: corrected the issue with removing the blank lines

ASKER CERTIFIED SOLUTION
Avatar of hadrons
hadrons

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial