• Status: Solved
  • Priority: Medium
  • Security: Private
  • Views: 82
  • Last Modified:

PHP Password Check for Dictionary Word

I have a simple PHP password application that checks for strength of a password before changing it.  In the link below I am trying to use the part that checks that the password is not a dictionary word.  The password I'm checking doesn't seem like it should be a dictionary word, but I keep getting that it is.  The link is: https://docstore.mik.ua/orelly/webprog/pcook/ch14_06.htm.  

Test password is MzRZ=2wdu;x?NLk

Code snippet I'm using is:

 $word_file= '/usr/share/dict/words';
   $lc_pass = strtolower($newPass);
   $denum_pass = strtr($lc_pass,'5301!','seoll');

   if (is_readable($word_file)) {
        if ($fh = fopen($word_file,'r')) {
            $found = false;
            while (! ($found || feof($fh))) {
                $word = preg_quote(trim(strtolower(fgets($fh,1024))),'/');
                if (preg_match("/$word/",$lc_pass) ||
                 preg_match("/$word/",$denum_pass)) {
                    $found = true;
                }
            }
            fclose($fh);
            if ($found) {
                $message[] = "Your new password is based on a dictionary word.";
                return false;
            }
        }
    }

Open in new window

0
credog
Asked:
credog
  • 8
  • 5
5 Solutions
 
gr8gonzoConsultantCommented:
If you have a line break at the very end of your words file, then your very last "word" might be a blank value, which would match EVERYTHING.

Try checking for an empty word before using it:
...
  $word = preg_quote(trim(strtolower(fgets($fh,1024))),'/');
 if(empty($word)) { continue; }
if (preg_match("/$word/",$lc_pass) ||
....
0
 
gr8gonzoConsultantCommented:
On a side note, if your "words" file doesn't actually contain regular expression patterns, then don't use regular expressions for matching.

The regular expression engine has a lot of overhead to it, and if you're simply looking for substrings, then use strpos (or if you need case-insensitive searches, use stripos). The strpos() function is EXTREMELY fast. A TON faster than regex, especially when you repeat it many times.
0
 
credogAuthor Commented:
Inserted the empty line and still seems to think it is a dictionary work.  Thanks
0
Managing Security & Risk at the Speed of Business

Gartner Research VP, Neil McDonald & AlgoSec CTO, Prof. Avishai Wool, discuss the business-driven approach to automated security policy management, its benefits and how to align security policy management with business processes to address today's security challenges.

 
gr8gonzoConsultantCommented:
And as long as we're talking about general improvements, unless your word file is over 250k, then just read the whole thing into memory at once with file(), and then loop through that.

When you do the partial read-match-read-match-read-match pattern, you're keeping the file open for longer than it needs to be, which makes it more susceptible to problems if the file needs to be updated in the middle of a read, and also keeps a handle open in the filesystem for longer, too.

Reading a file into memory all at once should be really fast compared to a lot of small reads, and you can even split it up into lines automatically with the file() command (and you can pass it an option to remove blank lines for you). Something like this:

$word_file= '/usr/share/dict/words';
$lc_pass = strtolower($newPass);
$denum_pass = strtr($lc_pass,'5301!','seoll');

if (is_readable($word_file))
{
  $words = file($word_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
  $found = false;
  foreach($words as $word)
  {
    $word = trim(strtolower($word));
    $found = ( (stripos($lc_pass,$word) !== false) || (stripos($denum_pass,$word) !== false) );
    if($found)
    {
      $foundMatch = $word;
      break;
    }
  }
}

if ($found)
{
  $message[] = "Your new password is based on a dictionary word: [{$foundMatch}]";
  return false;
}

Open in new window

1
 
credogAuthor Commented:
Thank you.  The file looks bigger than 250K.  Should I still try your suggestions?

Just for clarification, the /usr/share/dict/words word list is the standard one in a Redhat 6.9 system and looks to be about 4.8 M

$ ls -lh  /usr/share/dict/words
lrwxrwxrwx. 1 root root 11 Dec  3  2013 /usr/share/dict/words -> linux.words

# ls -lh  /usr/share/dict/linux.words
-rw-r--r--. 1 root root 4.8M Jan  7  2010 /usr/share/dict/linux.words
0
 
gr8gonzoConsultantCommented:
Hmmm, if it's that big, then it's debatable. I suppose it depends on how much memory you have overall and how much PHP processes are allowed to allocate. I think the default memory limit for PHP processes is either 8 or 16 megabytes (which is pretty low in my opinion). So an array containing 4.8 megabytes of strings would likely eat up most of an 8 megabyte limit. However, if you have lots of memory in your system and PHP can allocate 16 or more megs, then it's probably not an issue.
0
 
gr8gonzoConsultantCommented:
If it were me, I might have a separate process read the words file at night and then split it into 1 megabyte-sized files  that you loop through. All depends on your end goal.
0
 
credogAuthor Commented:
I guess my first goal is to just try to get this routine to not think that random password is a dictionary word.  So, how should I proceed since the if(empty($word)) { continue; } didn't seem to help the issue.  Should I just try your code on a test basis and if it works and  think about how to read the file (i.e. split it up, memory, etc. ) once it works.   I don't do this a lot so I'm I little confused why it's thinks that it is a dictionary word.
0
 
gr8gonzoConsultantCommented:
Well, my latest code also records which specific word was "found" in your password, so try what I gave you and see what the output is.
0
 
credogAuthor Commented:
I see what was causing the issue.  This word list has single characters in it for some reason, so if the password has a "f" in it it for example it will get caught as a dictionary word.  I took the word list and made a new one out of it that has 3 or more characters and it seems to be working better now.

I ran the original code and it took 5.6 seconds against the 3 character or more word list.  Your (gr8gonzo) code took .6 seconds on the same word list.  Pretty impressive.   I'm testing this on virtual machine at home, but will test on the real system tomorrow.

Instead of parsing out the word list and creating a new one, can your code be modified to only consider words in the list that are at least  2/3 characters (still trying to decide on how many) . That way I can keep the system as is without modifying the default word list.  

Also, by the way, your original suggestion of adding if(empty($word)) { continue; } to the original code ended up needing to be included in the original code.   The reason it appeared to not have an effect originally was that the program was hitting the single character before it got to the end of the file where it was needed.
0
 
gr8gonzoConsultantCommented:
Sure, instead of checking to see if it's empty, just check the length:

BEFORE:
if(empty($word)) { continue; }

AFTER:
if(strlen($word) < 3) { continue; }

That will skip any word that's shorter than 3 characters.

The 0.6 seconds is still a pretty long time to run. You might also consider skipping any words longer than the number of alpha characters in the password itself.

So before you read in the file, do:
$alphaCharsInPassword1 = strlen(preg_replace("@[^a-z]@","",$lc_pass));
$alphaCharsInPassword2 = strlen(preg_replace("@[^a-z]@","",$denum_pass));
$maxWordLength = max($alphaCharsInPassword1, $alphaCharsInPassword2);

$words = file($word_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

Then after checking the minimum word length, add...
if(strlen($word) < 3) { continue; }
if(strlen($word) > $maxWordLength) { continue; }

That might shave off a few more milliseconds. Personally, I wouldn't bother with anything less than 4 characters. It would be a pretty common scenario to find a 3-character sequence in a password that doesn't actually have a word.

Finally, there's a chance that a part of the 0.6 seconds is from just reading/loading the file itself, too, and you might not be able to get rid of that without doing something a little more extravagant (e.g. moving the file to a ramdisk or keeping the file in memory via some always-running daemon or something).
0
 
Julian HansenCommented:
I replicated the issue using Gr8gonzo's suggestion that there might be a space at the end of the file. Removing the space solved the issue.

This is caused by using the preg_match function - which in this case is not a good idea. You are looking for an exact match - a regular expression match is subject to triggering a false positive on a partial match.

Here is code I used that is simpler and not susceptible to that problem
PHP
$message = array();
//$newPass="MzRZ=2wdu;x?NLk";
$newPass="umbrella";
$found = false;
$word_file= 't3047.words';
$lc_pass = strtolower($newPass);
if (is_readable($word_file)) {
  $words = file($word_file,FILE_IGNORE_NEW_LINES);
  foreach($words as $w) {
    $w = strtolower(trim($w));
    $found =  ($w == $newPass || $w == $denum_pass);
  }
}

if ($found) {
  $message[] = "Your new password is based on a dictionary word.";
  echo "Debug: Found<br>";
  return false;
}

Open in new window

Word file
aardvark
bandalog
charlie
devil
elephant
fox
goat
hare
india
jump
kilometer
lima
mountain
nose
octopus
parrot
queen
ray
sugar
tango
umbrella
violin
wax
xylophone
yellow
zebra

Open in new window

Tested with your password and umbrella - both triggered as expected.
Blank line at the end of the file did not result in a false positive.

In this version we are using the file() function which reads a file into an array.

We are then iterating over this array to see if the word is in the file.

This can be further simplified if you make sure your file contains only lower case and you lower case your password before checking - you can then simply do this
$message = array();
// $newPass="MzRZ=2wdu;x?NLk";
$newPass="umbrella";
$found = false;
$word_file= 't3047.words';
$lc_pass = strtolower($newPass);
if (is_readable($word_file)) {
  $words = file($word_file, FILE_IGNORE_NEW_LINES);
  $found = in_array($newPass, $words);
}
if ($found) {
  $message[] = "Your new password is based on a dictionary word.";
  echo "Debug: Found<br>";
  return false;
}

Open in new window

Note the use of the FILE_IGNORE_NEW_LINES parameter in the file() call - this strips the newlines out of the data read from the file.
0
 
credogAuthor Commented:
gr8gonzo - I believe I correctly implemented your suggestions.  If you could verify I'd appreciate it.  The changes got it down to about .5 seconds and it appears to be working correctly.
 
if (is_readable($word_file)) {
        $lc_pass = strtolower($newPassword);
        $denum_pass = strtr($lc_pass,'5301!','seoll');

        $alphaCharsInPassword1 = strlen(preg_replace("@[^a-z]@","",$lc_pass));
        $alphaCharsInPassword2 = strlen(preg_replace("@[^a-z]@","",$denum_pass));
        $maxWordLength = max($alphaCharsInPassword1, $alphaCharsInPassword2);

        $words = file($word_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
        $found = false;
        foreach($words as $word) {
                $word = trim(strtolower($word));
                if(strlen($word) < 4) { continue; }
                if(strlen($word) > $maxWordLength) { continue; }
                $found = ( (stripos($lc_pass,$word) !== false) || (stripos($denum_pass,$word) !== false) );
                        if($found) {
                                $foundMatch = $word;
                                break;
                        }
        }

        if ($found) {
               return "Your new password is based on a dictionary word: [{$foundMatch}]";
                 return false;
         }
 }

Open in new window

0
 
gr8gonzoConsultantCommented:
Looks good
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

  • 8
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now