Link to home
Start Free TrialLog in
Avatar of stakor
stakorFlag for United States of America

asked on

Perl Substitution Table

I need to translate chunks of html that have made it into my text file, into just text. I am thinking of creating a text file that has two columns in it. One, with the html "'", and the other with the text to be inserted "'". Is there a way to have perl go through a text file, and use a text file as a translation reference. I am not sure how many things I am going to need to change, which is why I was thinking of using a text file, as it could handle 2 - 1000 changes...

Any thoughts on how to do this, without the ability to install any modules on the system?
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

Sure.  Something like this should work (assumes text columns are separated by tabs).
use strict;
use warnings;
use File::Copy qw(mv);
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$path}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window

Avatar of stakor

ASKER

This seems to be close, but I am seeing the mark up characters disappear, instead.

The translate.txt file:
&#39;   '
&amp;quot; "

Open in new window


The Source file:
 boss&#39;s
as &amp;quot;This Test&amp;quot;.

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of stakor

ASKER

The output now looks like:

boss's
as &amp;quot;This Test&amp;quot;.

The program looks like:
use strict;
use warnings;
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$pat}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window


The Translate text looks like:
&#39;   '
&amp;quot; "

Open in new window


So, the 's looks good, but the " does not yet.
see
perldoc -q "How do I efficiently match many regular expressions at once?"
Avatar of stakor

ASKER

I have found:

http://perldoc.perl.org/5.10.1/perlfaq6.html#How-do-I-efficiently-match-many-regular-expressions-at-once%3f

But honestly am not that good at perl yet. I think I can accomplish what I need with a set of sed commands. So, I will see if I can get that to work out for this project.
You can certainly do it with a series of sed commands.

Did you ever get the code I provided working completely?  If not, are you still interested in it?  If so, I'll take a look sometime "soon" (I've been busy lately).
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or die "Usage: $0 file_with_bad_text\n";
my $xlate;
{local @ARGV=qw(translate.txt);
  my $s;
  while( <> ){
      $s.="s{\\Q$1\\E}{$2}g;" if /(\S+)\s+(\S+)/;
  }
  die if $!;
  $xlate = eval "sub{$s}"
}
while( <> ){
    &$xlate;
    print;
}