asked on

Perl Substitution Table

I need to translate chunks of html that have made it into my text file, into just text. I am thinking of creating a text file that has two columns in it. One, with the html "'", and the other with the text to be inserted "'". Is there a way to have perl go through a text file, and use a text file as a translation reference. I am not sure how many things I am going to need to change, which is why I was thinking of using a text file, as it could handle 2 - 1000 changes...

Any thoughts on how to do this, without the ability to install any modules on the system?

wilcoxon

Sure. Something like this should work (assumes text columns are separated by tabs).

use strict;
use warnings;
use File::Copy qw(mv);
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$path}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window

stakor

ASKER

This seems to be close, but I am seeing the mark up characters disappear, instead.

The translate.txt file:

&#39;   '
&amp;quot; "

Open in new window

The Source file:

 boss&#39;s
as &amp;quot;This Test&amp;quot;.

Open in new window

ASKER CERTIFIED SOLUTION

wilcoxon

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

stakor

ASKER

The output now looks like:

boss's
as &quot;This Test&quot;.

The program looks like:

use strict;
use warnings;
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$pat}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window

The Translate text looks like:

&#39;   '
&amp;quot; "

Open in new window

So, the 's looks good, but the " does not yet.

ozo

see
perldoc -q "How do I efficiently match many regular expressions at once?"

stakor

ASKER

I have found:

http://perldoc.perl.org/5.10.1/perlfaq6.html#How-do-I-efficiently-match-many-regular-expressions-at-once%3f

But honestly am not that good at perl yet. I think I can accomplish what I need with a set of sed commands. So, I will see if I can get that to work out for this project.

wilcoxon

You can certainly do it with a series of sed commands.

Did you ever get the code I provided working completely? If not, are you still interested in it? If so, I'll take a look sometime "soon" (I've been busy lately).

ozo

#!/usr/bin/perl
use strict;
use warnings;
@ARGV or die "Usage: $0 file_with_bad_text\n";
my $xlate;
{local @ARGV=qw(translate.txt);
my $s;
while( <> ){
$s.="s{\\Q$1\\E}{$2}g;" if /(\S+)\s+(\S+)/;
}
die if $!;
$xlate = eval "sub{$s}"
}
while( <> ){
&$xlate;
print;
}