perl soundex function help

I am attempting to write a soundex function.
With the below code here is the result when i enter Euler:
U
1
1000
It is grabbing the the last number rather than the corresponding letter to number
Any help woild be appreciated

THE SOUNDEX ALGORITHM

The "Soundex" method of transforming names to a code that groups names/words that sound alike was originally
1.  Retain the first letter of the name for use in step 5.

2.  Assign the following numbers to the letters in the word (case insensitive):

LETTERS TO CONVERT  CODE TO USE
B F P V b f p v  1
C G J K Q S X Z c g j k q s x z  2
D T d t  3
L l  4
M N m n  5
R r  6

3.  If two or more letters with the same code were adjacent in the original name (before step 2), including the
first and any subsequent letters (see "Lloyd” example below), omit all but the first.

4.  Drop all occurrences of A, E, H, I, O, U W, and Y in other positions.   Also drop any non-alphabetic
characters such as the dash (-) and apostrophe (').

5.  Convert to the form "letter, digit, digit, digit, digit" by adding trailing zeros (if there are less than four digits), or
by dropping rightmost digits (if there are more than four) where the "letter” comes from step 1.  Your “letter”
should be capitalized.

For example, the names Euler, Gauss, Hilbert, Knuth, Lloyd, and Lukasieicz have the respective codes E4600, G2000, H4163, K5300, L3000, L2220.  Notice especially how Lloyd is converted to L3000 and how Euler converted to E4600 and not E6000.
``````my \$temp = uc shift;    # gets the name passed to the function
#  chomp \$temp;        # and uppercases it

\$temp =~ s/^[^a-z]//; # remove first character if not letter

my (\$f, \$fc) = (\$temp =~ /(\w)(\w+)/); #save the 1st char

my \$xx = (\$fc =~ s/AEHIOUWY//);#remove letters

my \$temp2 = ( \$xx =~ tr/BFPVCGJKQSXZDTLMNR/111122222222334556/); #replace remaining letters with numbers

my \$voo = (\$temp2 =~ /^(.)/);  # Grab the numbers
my \$foo = (\$voo =~  s/^\$fc+//); #replace any leading spaces
my \$boo = (\$foo =~  tr///cs);  #tran non-alpha chars to spaces
my \$zoo = (\$boo =~  tr/0//d); #delete all zeros
my \$sdxV = (\$zoo =~ \$f . \$zoo . '000'); #concatinate the 1st letter with corresponding number and add 3 zeros if needed

my \$sdxValue =  (\$sdxV =~ s/^(.{4}).*/\$1/);
print "\$sdxValue\n";

print "\$f\n";
print "\$voo\n";
print "\$sdxValue\n";
``````
Who is Participating?

Commented:
I made several changes.  You were trying to get rid out doubled letters too late.

Plus there is no regex (that I could find) that would match doubled letters anywhere in the string.  The \1 hack only gets rid of double letters at the start of the string, I believe.

``````my \$temp = uc shift;    # gets the name passed to the function

\$temp =~ s/^[^A-Z]//;   # remove first character if not letter

# Eliminate doubled-letters

@x = split(//,\$temp);
\$temp = '';
\$lastx = '';

foreach \$x ( @x)
{
if ( ( \$lastx eq '' ) || ( \$x ne \$lastx ) )
{
\$temp .= \$x;
\$lastx = \$x;
}
}
print STDERR "\$temp\n";

# Split into first letter and the rest

my (\$f, \$fc) = (\$temp =~ /(\w)(\w+)/);
print "\$f\n";

# soundex weighting

\$fc =~ tr/AEHIOUWYBFPVCGJKQSXZDTLMNR/00000000111122222222334556/;
\$fc =~  s/0+//g; #gets rid of extra 0's
print "\$fc\n";

# just the first four

if (\$fc =~ /^(.{4}).*/g)
{
print "\$fc\n";
}
else
{
\$fc = \$fc."0"x(4 - length(\$fc));
print "\$fc\n";
}
``````
0

Commented:
This may not be the only bug, but you have alrady upcased \$temp on input, and so you need to either use a /i on the search or use upper-case A-Z, like this:

`````` \$temp =~ s/^[^A-Z]//; # remove first character if not letter
``````
0

Author Commented:
Yes you are correct i caught that early on.
Maybe you can help me with another problem.

I need to replace duplicate letters with only one for example Lloyd is returning 4300
where is thould return 3000. If you read above explaining the soundex conversion rules.
None of these work:

my \$temp2 = (\$fc =~ s/(\w)\1/\$1/g);
#\$fc =~ s/(\w)\1/\$1/;
#my \$temp2 = (\$fc =~ s/(.)(?=.*?\1)/\$1/g);
#my \$temp2 = (\$fc =~ tr/a-zA-Z/a-zA-Z/s);

``````my \$temp = uc shift;    # gets the name passed to the function

\$temp =~ s/^[^A-Z]//; # remove first character if not letter
my (\$f, \$fc) = (\$temp =~ /(\w)(\w+)/);
print "\$f\n";
#print "\$fc\n";
\$fc =~ tr/AEHIOUWYBFPVCGJKQSXZDTLMNR/00000000111122222222334556/;
\$fc =~  s/0+//g; #gets rid of extra 0's

#this section I am trying to eliminate duplicate letters
my \$temp2 = (\$fc =~ s/(\w)\1/\$1/g);
#\$fc =~ s/(\w)\1/\$1/;
#my \$temp2 = (\$fc =~ s/(.)(?=.*?\1)/\$1/g);
#my \$temp2 = (\$fc =~ tr/a-zA-Z/a-zA-Z/s);
#print "\$temp2\n";
print "\$fc\n";
if (\$fc =~ /^(.{4}).*/g)
{
print "\$fc\n";
}
else
{
\$fc = \$fc."0"x(4 - length(\$fc));
print "\$fc\n";
}
``````
0

Commented:
I think this:

``````\$fc = substr(\$fc.'000',0,4);    # first four digits, pad with zeroes on right if needed
print "\$fc\n";
``````

can replace this:

``````if (\$fc =~ /^(.{4}).*/g)
{
print "\$fc\n";
}
else
{
\$fc = \$fc."0"x(4 - length(\$fc));
print "\$fc\n";
}
``````
0

Author Commented:
Thank you that worked
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.