PHP Password Check for Dictionary Word

I have a simple PHP password application that checks for strength of a password before changing it.  In the link below I am trying to use the part that checks that the password is not a dictionary word.  The password I'm checking doesn't seem like it should be a dictionary word, but I keep getting that it is.  The link is: https://docstore.mik.ua/orelly/webprog/pcook/ch14_06.htm.  

Test password is MzRZ=2wdu;x?NLk

Code snippet I'm using is:

 $word_file= '/usr/share/dict/words';
   $lc_pass = strtolower($newPass);
   $denum_pass = strtr($lc_pass,'5301!','seoll');

   if (is_readable($word_file)) {
        if ($fh = fopen($word_file,'r')) {
            $found = false;
            while (! ($found || feof($fh))) {
                $word = preg_quote(trim(strtolower(fgets($fh,1024))),'/');
                if (preg_match("/$word/",$lc_pass) ||
                 preg_match("/$word/",$denum_pass)) {
                    $found = true;
                }
            }
            fclose($fh);
            if ($found) {
                $message[] = "Your new password is based on a dictionary word.";
                return false;
            }
        }
    }

Open in new window

credogAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gr8gonzoConsultantCommented:
If you have a line break at the very end of your words file, then your very last "word" might be a blank value, which would match EVERYTHING.

Try checking for an empty word before using it:
...
  $word = preg_quote(trim(strtolower(fgets($fh,1024))),'/');
 if(empty($word)) { continue; }
if (preg_match("/$word/",$lc_pass) ||
....
0
gr8gonzoConsultantCommented:
On a side note, if your "words" file doesn't actually contain regular expression patterns, then don't use regular expressions for matching.

The regular expression engine has a lot of overhead to it, and if you're simply looking for substrings, then use strpos (or if you need case-insensitive searches, use stripos). The strpos() function is EXTREMELY fast. A TON faster than regex, especially when you repeat it many times.
0
credogAuthor Commented:
Inserted the empty line and still seems to think it is a dictionary work.  Thanks
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

gr8gonzoConsultantCommented:
And as long as we're talking about general improvements, unless your word file is over 250k, then just read the whole thing into memory at once with file(), and then loop through that.

When you do the partial read-match-read-match-read-match pattern, you're keeping the file open for longer than it needs to be, which makes it more susceptible to problems if the file needs to be updated in the middle of a read, and also keeps a handle open in the filesystem for longer, too.

Reading a file into memory all at once should be really fast compared to a lot of small reads, and you can even split it up into lines automatically with the file() command (and you can pass it an option to remove blank lines for you). Something like this:

$word_file= '/usr/share/dict/words';
$lc_pass = strtolower($newPass);
$denum_pass = strtr($lc_pass,'5301!','seoll');

if (is_readable($word_file))
{
  $words = file($word_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
  $found = false;
  foreach($words as $word)
  {
    $word = trim(strtolower($word));
    $found = ( (stripos($lc_pass,$word) !== false) || (stripos($denum_pass,$word) !== false) );
    if($found)
    {
      $foundMatch = $word;
      break;
    }
  }
}

if ($found)
{
  $message[] = "Your new password is based on a dictionary word: [{$foundMatch}]";
  return false;
}

Open in new window

1
credogAuthor Commented:
Thank you.  The file looks bigger than 250K.  Should I still try your suggestions?

Just for clarification, the /usr/share/dict/words word list is the standard one in a Redhat 6.9 system and looks to be about 4.8 M

$ ls -lh  /usr/share/dict/words
lrwxrwxrwx. 1 root root 11 Dec  3  2013 /usr/share/dict/words -> linux.words

# ls -lh  /usr/share/dict/linux.words
-rw-r--r--. 1 root root 4.8M Jan  7  2010 /usr/share/dict/linux.words
0
gr8gonzoConsultantCommented:
Hmmm, if it's that big, then it's debatable. I suppose it depends on how much memory you have overall and how much PHP processes are allowed to allocate. I think the default memory limit for PHP processes is either 8 or 16 megabytes (which is pretty low in my opinion). So an array containing 4.8 megabytes of strings would likely eat up most of an 8 megabyte limit. However, if you have lots of memory in your system and PHP can allocate 16 or more megs, then it's probably not an issue.
0
gr8gonzoConsultantCommented:
If it were me, I might have a separate process read the words file at night and then split it into 1 megabyte-sized files  that you loop through. All depends on your end goal.
0
credogAuthor Commented:
I guess my first goal is to just try to get this routine to not think that random password is a dictionary word.  So, how should I proceed since the if(empty($word)) { continue; } didn't seem to help the issue.  Should I just try your code on a test basis and if it works and  think about how to read the file (i.e. split it up, memory, etc. ) once it works.   I don't do this a lot so I'm I little confused why it's thinks that it is a dictionary word.
0
gr8gonzoConsultantCommented:
Well, my latest code also records which specific word was "found" in your password, so try what I gave you and see what the output is.
0
credogAuthor Commented:
I see what was causing the issue.  This word list has single characters in it for some reason, so if the password has a "f" in it it for example it will get caught as a dictionary word.  I took the word list and made a new one out of it that has 3 or more characters and it seems to be working better now.

I ran the original code and it took 5.6 seconds against the 3 character or more word list.  Your (gr8gonzo) code took .6 seconds on the same word list.  Pretty impressive.   I'm testing this on virtual machine at home, but will test on the real system tomorrow.

Instead of parsing out the word list and creating a new one, can your code be modified to only consider words in the list that are at least  2/3 characters (still trying to decide on how many) . That way I can keep the system as is without modifying the default word list.  

Also, by the way, your original suggestion of adding if(empty($word)) { continue; } to the original code ended up needing to be included in the original code.   The reason it appeared to not have an effect originally was that the program was hitting the single character before it got to the end of the file where it was needed.
0
gr8gonzoConsultantCommented:
Sure, instead of checking to see if it's empty, just check the length:

BEFORE:
if(empty($word)) { continue; }

AFTER:
if(strlen($word) < 3) { continue; }

That will skip any word that's shorter than 3 characters.

The 0.6 seconds is still a pretty long time to run. You might also consider skipping any words longer than the number of alpha characters in the password itself.

So before you read in the file, do:
$alphaCharsInPassword1 = strlen(preg_replace("@[^a-z]@","",$lc_pass));
$alphaCharsInPassword2 = strlen(preg_replace("@[^a-z]@","",$denum_pass));
$maxWordLength = max($alphaCharsInPassword1, $alphaCharsInPassword2);

$words = file($word_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

Then after checking the minimum word length, add...
if(strlen($word) < 3) { continue; }
if(strlen($word) > $maxWordLength) { continue; }

That might shave off a few more milliseconds. Personally, I wouldn't bother with anything less than 4 characters. It would be a pretty common scenario to find a 3-character sequence in a password that doesn't actually have a word.

Finally, there's a chance that a part of the 0.6 seconds is from just reading/loading the file itself, too, and you might not be able to get rid of that without doing something a little more extravagant (e.g. moving the file to a ramdisk or keeping the file in memory via some always-running daemon or something).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Julian HansenCommented:
I replicated the issue using Gr8gonzo's suggestion that there might be a space at the end of the file. Removing the space solved the issue.

This is caused by using the preg_match function - which in this case is not a good idea. You are looking for an exact match - a regular expression match is subject to triggering a false positive on a partial match.

Here is code I used that is simpler and not susceptible to that problem
PHP
$message = array();
//$newPass="MzRZ=2wdu;x?NLk";
$newPass="umbrella";
$found = false;
$word_file= 't3047.words';
$lc_pass = strtolower($newPass);
if (is_readable($word_file)) {
  $words = file($word_file,FILE_IGNORE_NEW_LINES);
  foreach($words as $w) {
    $w = strtolower(trim($w));
    $found =  ($w == $newPass || $w == $denum_pass);
  }
}

if ($found) {
  $message[] = "Your new password is based on a dictionary word.";
  echo "Debug: Found<br>";
  return false;
}

Open in new window

Word file
aardvark
bandalog
charlie
devil
elephant
fox
goat
hare
india
jump
kilometer
lima
mountain
nose
octopus
parrot
queen
ray
sugar
tango
umbrella
violin
wax
xylophone
yellow
zebra

Open in new window

Tested with your password and umbrella - both triggered as expected.
Blank line at the end of the file did not result in a false positive.

In this version we are using the file() function which reads a file into an array.

We are then iterating over this array to see if the word is in the file.

This can be further simplified if you make sure your file contains only lower case and you lower case your password before checking - you can then simply do this
$message = array();
// $newPass="MzRZ=2wdu;x?NLk";
$newPass="umbrella";
$found = false;
$word_file= 't3047.words';
$lc_pass = strtolower($newPass);
if (is_readable($word_file)) {
  $words = file($word_file, FILE_IGNORE_NEW_LINES);
  $found = in_array($newPass, $words);
}
if ($found) {
  $message[] = "Your new password is based on a dictionary word.";
  echo "Debug: Found<br>";
  return false;
}

Open in new window

Note the use of the FILE_IGNORE_NEW_LINES parameter in the file() call - this strips the newlines out of the data read from the file.
0
credogAuthor Commented:
gr8gonzo - I believe I correctly implemented your suggestions.  If you could verify I'd appreciate it.  The changes got it down to about .5 seconds and it appears to be working correctly.
 
if (is_readable($word_file)) {
        $lc_pass = strtolower($newPassword);
        $denum_pass = strtr($lc_pass,'5301!','seoll');

        $alphaCharsInPassword1 = strlen(preg_replace("@[^a-z]@","",$lc_pass));
        $alphaCharsInPassword2 = strlen(preg_replace("@[^a-z]@","",$denum_pass));
        $maxWordLength = max($alphaCharsInPassword1, $alphaCharsInPassword2);

        $words = file($word_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
        $found = false;
        foreach($words as $word) {
                $word = trim(strtolower($word));
                if(strlen($word) < 4) { continue; }
                if(strlen($word) > $maxWordLength) { continue; }
                $found = ( (stripos($lc_pass,$word) !== false) || (stripos($denum_pass,$word) !== false) );
                        if($found) {
                                $foundMatch = $word;
                                break;
                        }
        }

        if ($found) {
               return "Your new password is based on a dictionary word: [{$foundMatch}]";
                 return false;
         }
 }

Open in new window

0
gr8gonzoConsultantCommented:
Looks good
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.