Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

perl soundex function help

Posted on 2011-02-26
5
Medium Priority
?
401 Views
Last Modified: 2012-05-11
I am attempting to write a soundex function.
With the below code here is the result when i enter Euler:
U
1
1000
It is grabbing the the last number rather than the corresponding letter to number
Any help woild be appreciated

THE SOUNDEX ALGORITHM
 
The "Soundex" method of transforming names to a code that groups names/words that sound alike was originally  
1.  Retain the first letter of the name for use in step 5.
 
2.  Assign the following numbers to the letters in the word (case insensitive):
 
LETTERS TO CONVERT  CODE TO USE
B F P V b f p v  1
C G J K Q S X Z c g j k q s x z  2
D T d t  3
L l  4
M N m n  5
R r  6
 
3.  If two or more letters with the same code were adjacent in the original name (before step 2), including the
first and any subsequent letters (see "Lloyd” example below), omit all but the first.
 
4.  Drop all occurrences of A, E, H, I, O, U W, and Y in other positions.   Also drop any non-alphabetic
characters such as the dash (-) and apostrophe (').
 
5.  Convert to the form "letter, digit, digit, digit, digit" by adding trailing zeros (if there are less than four digits), or
by dropping rightmost digits (if there are more than four) where the "letter” comes from step 1.  Your “letter”
should be capitalized.
 
For example, the names Euler, Gauss, Hilbert, Knuth, Lloyd, and Lukasieicz have the respective codes E4600, G2000, H4163, K5300, L3000, L2220.  Notice especially how Lloyd is converted to L3000 and how Euler converted to E4600 and not E6000.
my $temp = uc shift;    # gets the name passed to the function
#  chomp $temp;        # and uppercases it

    $temp =~ s/^[^a-z]//; # remove first character if not letter
  
    my ($f, $fc) = ($temp =~ /(\w)(\w+)/); #save the 1st char
  
 
    my $xx = ($fc =~ s/AEHIOUWY//);#remove letters

    my $temp2 = ( $xx =~ tr/BFPVCGJKQSXZDTLMNR/111122222222334556/); #replace remaining letters with numbers
 
     my $voo = ($temp2 =~ /^(.)/);  # Grab the numbers
     my $foo = ($voo =~  s/^$fc+//); #replace any leading spaces
     my $boo = ($foo =~  tr///cs);  #tran non-alpha chars to spaces
     my $zoo = ($boo =~  tr/0//d); #delete all zeros
     my $sdxV = ($zoo =~ $f . $zoo . '000'); #concatinate the 1st letter with corresponding number and add 3 zeros if needed
     
    my $sdxValue =  ($sdxV =~ s/^(.{4}).*/$1/);
   print "$sdxValue\n";

print "$f\n";
print "$voo\n";
print "$sdxValue\n";

Open in new window

0
Comment
Question by:fac66
  • 3
  • 2
5 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 34989380
This may not be the only bug, but you have alrady upcased $temp on input, and so you need to either use a /i on the search or use upper-case A-Z, like this:

 $temp =~ s/^[^A-Z]//; # remove first character if not letter

Open in new window

0
 

Author Comment

by:fac66
ID: 34989491
Yes you are correct i caught that early on.
Maybe you can help me with another problem.

I need to replace duplicate letters with only one for example Lloyd is returning 4300
where is thould return 3000. If you read above explaining the soundex conversion rules.
None of these work:

my $temp2 = ($fc =~ s/(\w)\1/$1/g);
 #$fc =~ s/(\w)\1/$1/;
 #my $temp2 = ($fc =~ s/(.)(?=.*?\1)/$1/g);
 #my $temp2 = ($fc =~ tr/a-zA-Z/a-zA-Z/s);
 
my $temp = uc shift;    # gets the name passed to the function

    $temp =~ s/^[^A-Z]//; # remove first character if not letter
    my ($f, $fc) = ($temp =~ /(\w)(\w+)/);
    print "$f\n";
    #print "$fc\n";
    $fc =~ tr/AEHIOUWYBFPVCGJKQSXZDTLMNR/00000000111122222222334556/;
    $fc =~  s/0+//g; #gets rid of extra 0's

#this section I am trying to eliminate duplicate letters
    my $temp2 = ($fc =~ s/(\w)\1/$1/g);
    #$fc =~ s/(\w)\1/$1/;
    #my $temp2 = ($fc =~ s/(.)(?=.*?\1)/$1/g);
    #my $temp2 = ($fc =~ tr/a-zA-Z/a-zA-Z/s);
    #print "$temp2\n";
     print "$fc\n";
    if ($fc =~ /^(.{4}).*/g)
        {
          print "$fc\n";
       }
          else 
        {
                $fc = $fc."0"x(4 - length($fc));
                print "$fc\n";
         }

Open in new window

0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 2000 total points
ID: 34989535
I made several changes.  You were trying to get rid out doubled letters too late.

Plus there is no regex (that I could find) that would match doubled letters anywhere in the string.  The \1 hack only gets rid of double letters at the start of the string, I believe.

my $temp = uc shift;    # gets the name passed to the function

$temp =~ s/^[^A-Z]//;   # remove first character if not letter

# Eliminate doubled-letters

@x = split(//,$temp);
$temp = '';
$lastx = '';

foreach $x ( @x)
{
    if ( ( $lastx eq '' ) || ( $x ne $lastx ) )
    {
        $temp .= $x;
        $lastx = $x;
    }
}
print STDERR "$temp\n";

# Split into first letter and the rest

my ($f, $fc) = ($temp =~ /(\w)(\w+)/);
print "$f\n";

# soundex weighting

$fc =~ tr/AEHIOUWYBFPVCGJKQSXZDTLMNR/00000000111122222222334556/;
$fc =~  s/0+//g; #gets rid of extra 0's
print "$fc\n";

# just the first four

if ($fc =~ /^(.{4}).*/g)
{
    print "$fc\n";
}
else 
{
    $fc = $fc."0"x(4 - length($fc));
    print "$fc\n";
}

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34989570
I think this:

$fc = substr($fc.'000',0,4);    # first four digits, pad with zeroes on right if needed
print "$fc\n";

Open in new window


can replace this:

if ($fc =~ /^(.{4}).*/g)
{
    print "$fc\n";
}
else 
{
    $fc = $fc."0"x(4 - length($fc));
    print "$fc\n";
} 

Open in new window

0
 

Author Comment

by:fac66
ID: 34989585
Thank you that worked
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question