Solved

Perl Substitution Table

Posted on 2014-02-18
8
304 Views
Last Modified: 2014-02-20
I need to translate chunks of html that have made it into my text file, into just text. I am thinking of creating a text file that has two columns in it. One, with the html "'", and the other with the text to be inserted "'". Is there a way to have perl go through a text file, and use a text file as a translation reference. I am not sure how many things I am going to need to change, which is why I was thinking of using a text file, as it could handle 2 - 1000 changes...

Any thoughts on how to do this, without the ability to install any modules on the system?
0
Comment
Question by:stakor
  • 3
  • 3
  • 2
8 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39869448
Sure.  Something like this should work (assumes text columns are separated by tabs).
use strict;
use warnings;
use File::Copy qw(mv);
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$path}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window

0
 

Author Comment

by:stakor
ID: 39869477
This seems to be close, but I am seeing the mark up characters disappear, instead.

The translate.txt file:
&#39;   '
&amp;quot; "

Open in new window


The Source file:
 boss&#39;s
as &amp;quot;This Test&amp;quot;.

Open in new window

0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 39869486
I see one typo that should have crashed the code (unless you removed the use strict and/or use warnings).

Line 16 should be $xlate{$pat} - not $xlate{$path}

Also, you can remove line 3 (use File::Copy qw(mv)) - I didn't end up using it.
0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 

Author Comment

by:stakor
ID: 39869547
The output now looks like:

boss's
as &amp;quot;This Test&amp;quot;.

The program looks like:
use strict;
use warnings;
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$pat}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window


The Translate text looks like:
&#39;   '
&amp;quot; "

Open in new window


So, the 's looks good, but the " does not yet.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39869551
see
perldoc -q "How do I efficiently match many regular expressions at once?"
0
 

Author Comment

by:stakor
ID: 39869563
I have found:

http://perldoc.perl.org/5.10.1/perlfaq6.html#How-do-I-efficiently-match-many-regular-expressions-at-once%3f

But honestly am not that good at perl yet. I think I can accomplish what I need with a set of sed commands. So, I will see if I can get that to work out for this project.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39875909
You can certainly do it with a series of sed commands.

Did you ever get the code I provided working completely?  If not, are you still interested in it?  If so, I'll take a look sometime "soon" (I've been busy lately).
0
 
LVL 84

Expert Comment

by:ozo
ID: 39876005
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or die "Usage: $0 file_with_bad_text\n";
my $xlate;
{local @ARGV=qw(translate.txt);
  my $s;
  while( <> ){
      $s.="s{\\Q$1\\E}{$2}g;" if /(\S+)\s+(\S+)/;
  }
  die if $!;
  $xlate = eval "sub{$s}"
}
while( <> ){
    &$xlate;
    print;
}
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
binary to char / hexadecimal 5 114
hard perl script 16 158
syslog unix file 20 73
Perl regex to replace any capital letters not preceded by ">" 6 121
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

823 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question