Solved

Perl Substitution Table

Posted on 2014-02-18
8
295 Views
Last Modified: 2014-02-20
I need to translate chunks of html that have made it into my text file, into just text. I am thinking of creating a text file that has two columns in it. One, with the html "'", and the other with the text to be inserted "'". Is there a way to have perl go through a text file, and use a text file as a translation reference. I am not sure how many things I am going to need to change, which is why I was thinking of using a text file, as it could handle 2 - 1000 changes...

Any thoughts on how to do this, without the ability to install any modules on the system?
0
Comment
Question by:stakor
  • 3
  • 3
  • 2
8 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39869448
Sure.  Something like this should work (assumes text columns are separated by tabs).
use strict;
use warnings;
use File::Copy qw(mv);
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$path}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window

0
 

Author Comment

by:stakor
ID: 39869477
This seems to be close, but I am seeing the mark up characters disappear, instead.

The translate.txt file:
&#39;   '
&amp;quot; "

Open in new window


The Source file:
 boss&#39;s
as &amp;quot;This Test&amp;quot;.

Open in new window

0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 39869486
I see one typo that should have crashed the code (unless you removed the use strict and/or use warnings).

Line 16 should be $xlate{$pat} - not $xlate{$path}

Also, you can remove line 3 (use File::Copy qw(mv)) - I didn't end up using it.
0
 

Author Comment

by:stakor
ID: 39869547
The output now looks like:

boss's
as &amp;quot;This Test&amp;quot;.

The program looks like:
use strict;
use warnings;
open TXT, 'translate.txt' or die "could not open translate.txt: $!";
my %xlate;
while (<TXT>) {
    chomp;
    my ($old, $new) = split /\t+/;
    $xlate{$old} = $new;
}
close TXT;
my $fil = shift or die "Usage: $0 file_with_bad_text\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {
    foreach my $pat (keys %xlate) {
        s{$pat}{$xlate{$pat}}g; # possibly \b$pat\b or \Q$pat - see what works for you
    }
    print;
}
close IN;

Open in new window


The Translate text looks like:
&#39;   '
&amp;quot; "

Open in new window


So, the 's looks good, but the " does not yet.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 84

Expert Comment

by:ozo
ID: 39869551
see
perldoc -q "How do I efficiently match many regular expressions at once?"
0
 

Author Comment

by:stakor
ID: 39869563
I have found:

http://perldoc.perl.org/5.10.1/perlfaq6.html#How-do-I-efficiently-match-many-regular-expressions-at-once%3f

But honestly am not that good at perl yet. I think I can accomplish what I need with a set of sed commands. So, I will see if I can get that to work out for this project.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 39875909
You can certainly do it with a series of sed commands.

Did you ever get the code I provided working completely?  If not, are you still interested in it?  If so, I'll take a look sometime "soon" (I've been busy lately).
0
 
LVL 84

Expert Comment

by:ozo
ID: 39876005
#!/usr/bin/perl
use strict;
use warnings;
@ARGV or die "Usage: $0 file_with_bad_text\n";
my $xlate;
{local @ARGV=qw(translate.txt);
  my $s;
  while( <> ){
      $s.="s{\\Q$1\\E}{$2}g;" if /(\S+)\s+(\S+)/;
  }
  die if $!;
  $xlate = eval "sub{$s}"
}
while( <> ){
    &$xlate;
    print;
}
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now