Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Perl Substitution Table

Posted on 2014-02-18
8
Medium Priority
?
323 Views
Last Modified: 2014-02-20
I need to translate chunks of html that have made it into my text file, into just text. I am thinking of creating a text file that has two columns in it. One, with the html "'", and the other with the text to be inserted "'". Is there a way to have perl go through a text file, and use a text file as a translation reference. I am not sure how many things I am going to need to change, which is why I was thinking of using a text file, as it could handle 2 - 1000 changes...

Any thoughts on how to do this, without the ability to install any modules on the system?
0
Comment
Question by:stakor
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39869448
Sure.  Something like this should work (assumes text columns are separated by tabs).
use strict;
use warnings;
use File::Copy qw(mv);
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$path}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window

0
 

Author Comment

by:stakor
ID: 39869477
This seems to be close, but I am seeing the mark up characters disappear, instead.

The translate.txt file:
&#39;   '
&amp;quot; "

Open in new window


The Source file:
 boss&#39;s
as &amp;quot;This Test&amp;quot;.

Open in new window

0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 2000 total points
ID: 39869486
I see one typo that should have crashed the code (unless you removed the use strict and/or use warnings).

Line 16 should be $xlate{$pat} - not $xlate{$path}

Also, you can remove line 3 (use File::Copy qw(mv)) - I didn't end up using it.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 

Author Comment

by:stakor
ID: 39869547
The output now looks like:

boss's
as &amp;quot;This Test&amp;quot;.

The program looks like:
use strict;
use warnings;
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$pat}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window


The Translate text looks like:
&#39;   '
&amp;quot; "

Open in new window


So, the 's looks good, but the " does not yet.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39869551
see
perldoc -q "How do I efficiently match many regular expressions at once?"
0
 

Author Comment

by:stakor
ID: 39869563
I have found:

http://perldoc.perl.org/5.10.1/perlfaq6.html#How-do-I-efficiently-match-many-regular-expressions-at-once%3f

But honestly am not that good at perl yet. I think I can accomplish what I need with a set of sed commands. So, I will see if I can get that to work out for this project.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39875909
You can certainly do it with a series of sed commands.

Did you ever get the code I provided working completely?  If not, are you still interested in it?  If so, I'll take a look sometime "soon" (I've been busy lately).
0
 
LVL 84

Expert Comment

by:ozo
ID: 39876005
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or die "Usage: $0 file_with_bad_text\n";
my $xlate;
{local @ARGV=qw(translate.txt);
  my $s;
  while( <> ){
      $s.="s{\\Q$1\\E}{$2}g;" if /(\S+)\s+(\S+)/;
  }
  die if $!;
  $xlate = eval "sub{$s}"
}
while( <> ){
    &$xlate;
    print;
}
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question