asked on

Need help with a subrountine (involves using the uniq() function and regular expression)

Hi, I have a script that involves using a subroutine to modify and return a value and in its original state it works as expected, but I've been stymied at making two improvements to the returning results:
1) having only value return if there are duplications of that string (so if "Mariá" appears 5x, I only want to see it once and not 5x)

2) if a pattern appears just one, I want it deleted (if it appears more than once in a string, its fine)

So this is the base script:
my @files = glob("*_mismatch_htmlcodes.txt");
foreach my $file(@files) {
open FILE, '<:encoding(UTF-8)', ("$file") or warn "Can't open $file: $!";
open NEW, '>>:encoding(UTF-8)', ("$file._mismatch_htmlcodes_egrep.txt") or warn "Cannot open file for write: $!";

system ("echo $file");

select (NEW);

while (<FILE>) {
chomp ($_);
@new_field = split(/$\//,$_);
$new_field = $_;

if (/EGREP/) { ## I want to print values with just this string in it
print &CleanHTML6($_);
print $/;

}

} # while loop

} ## for each

print $NEW;
close(OLD);
close(NEW);

sub CleanHTML6 {
my ($value) = @_;

$value =~ s/THIS/THAT/g;

return $value;
}

It works fine, but I tried to use the uniq() to consolidate the results:

use List::MoreUtils qw(uniq);
my @files = glob("*_mismatch_htmlcodes.txt");
foreach my $file(@files) {
open FILE, '<:encoding(UTF-8)', ("$file") or warn "Can't open $file: $!";
open NEW, '>>:encoding(UTF-8)', ("$file._mismatch_htmlcodes_egrep.txt") or warn "Cannot open file for write: $!";

system ("echo $file");

select (NEW);

while (<FILE>) {
chomp ($_);
@new_field = split(/$\//,$_);
$new_field = $_;

if (/EGREP/) {
print &CleanHTML6($_);
print $/;

}

} # while loop

} ## for each

print $NEW;
close(OLD);
close(NEW);

sub CleanHTML6 {
my ($value) = @_;
my @unique_value = uniq @value;

$unique_value =~ s/THIS/THAT/g;

return $unique_value;
}

And it just returns blank lines. Also, its here:

$unique_value =~ s/THIS//g;
that a regular expression that can delete the value if it finds the pattern THIS only once on the line; thanks

ozo

with
use warnings; use diagnostics;
Name "main::value" used only once: possible typo at line 33.
Name "main::OLD" used only once: possible typo at line 27.
(W once) Typographical errors often show up as unique variable
names. If you had a good reason for having a unique name, then
just mention it again somehow to suppress the message. The our
declaration is also provided for this purpose.

NOTE: This warning detects package symbols that have been used
only once. This means lexical variables will never trigger this
warning. It also means that all of the package variables $c, @c,
%c, as well as *c, &c, sub c{}, c(), and c (the filehandle or
format) are considered the same; if a program uses $c only once
but also uses any of the others it will not trigger this warning.
Symbols beginning with an underscore and symbols using special
identifiers (q.v. perldata) are exempt from this warning.
Global symbol "@value" requires explicit package name (did you forget to declare "my @value"?) at line 33.
Global symbol "$unique_value" requires explicit package name (did you forget to declare "my $unique_value"?) at line 35.
Global symbol "$unique_value" requires explicit package name (did you forget to declare "my $unique_value"?) at line 38.
Use of uninitialized value $unique_value in substitution (s///) at line 35, <FILE> line 5
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.

To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
Use of uninitialized value $NEW in print at line 26, <FILE> line 6.

Did you mean to say
return @unique_value;
?

@unique_value is an array. $unique_value is a scalar that was never initialized
Also, @value is an array that doesn't seem to have been initialized. Did you mean to use the scalar $value?

hadrons

ASKER

If I read your post correctly, I should be doing something closer to this?

my @unique_value = uniq @_;
my $unique_value (@unique_value );
$unique_value =~ s/THIS/THAT/g;

The above didn't run, but this did:

my @unique_value = uniq @_;

$unique_value =~ s/THIS/THAT/g;

But just blank lines were returned

return $unique_value;

ozo

Where did you set $unique_value?

hadrons

ASKER

I wasn't able to do this through the subrountine, but I did modifiy the WHILE statement:

my %seen;
while (<FILE>) {
chomp ($_);
@new_field = split(/$\//,$_);
$new_field = $_;

print if $seen{&CleanHTML($_)}++ == 1;
print $/;

} # while loop

Works great ... it does keep linebreaks in where a dup was found (so the first Mariá is there, but other cases would be a blank line) ... I can probably find a way to eliminate them
On edit: corrected the issue with removing the blank lines

ASKER CERTIFIED SOLUTION

hadrons

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial