asked on

preg_match question

Ok I am doing security upgrade for my site and I have few questions,

What I want to do is limit users to be allowed to create usernames only including letters A-Z a-z and numbers 0-9. Is this good idea or should I allow more (like - and _ and whitespace in between (using trim to strip from beginning and end)). I was thinking about using preg_match but I don't know how to write regular expressions.

Also what illegal characters should be used for passwords (if any). Right now I'm just stripping ' , but I will do md5 or custom encryption so there could be any char there...but what would be most logical you guys let me know. Also, should I really do md5 (no way to re-send passwords only reset) or should I rather do some other custom encryption?

And lastly, is there any quick way to validate email pattern (universal for all email types) using preg_match?

Thanks

GVNPublic123

ASKER

ok found expression /[\w ]/ that does all illegal chars.

GVNPublic123

ASKER

ok for emails I found on some JS tutorials site:
var emailPattern = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/;

Will that pattern work in ALL email cases properly? Even if email is fake I dont care, I just want correct pattern (I know there can be further validation done, I just dont care because emails are manually confirmed by users via activation link).

GVNPublic123

ASKER

or this one:
if(!preg_match("/^( [a-zA-Z0-9] )+( [a-zA-Z0-9\._-] )*@( [a-zA-Z0-9_-] )+( [a-zA-Z0-9\._-] +)+$/" , $email))

gr8gonzo

First things first - learning regular expressions will benefit you in more than just this project. They're not very hard to learn, either. There's a lot you CAN learn, but most people use just the basics, and those are easy to master.

http://www.regular-expressions.info/tutorial.html

Regarding the security side of things, I would probably allow all alphanumeric characters, as well as @, underscores _, dashes -, and periods ., since some people like to use their email addresses as their usernames. A regular expression for this would be:

[a-zA-Z0-9@_\.\-]

That expression would successfully match against any lower or upper case letter, any number, plus the aforementioned special characters. The brackets [ ] surrounding it simply indicate the possible range of characters.

When your form submits, you can use preg_replace to strip out any character that does not fall in that range. The syntax is basically:

$new_value = preg_replace("/REGULAR EXPRESSION HERE/","REPLACE WITH THIS",$original_value);

So to strip out all but the wanted characters in your username field:
$username = $_POST["username"];
$username = preg_replace("/[a-zA-Z0-9@_\.\-]/","",$username);

Regarding passwords, if you're doing one-way encryption with MD5, there's no need to restrict any characters. I would definitely avoid custom encryption if you're asking the question. The only people that should do custom encryption are those that are comfortable enough with that specific type of coding to not ask the question, in my opinion.

You may want to consider a different hash, though, like SHA-1 or something. While MD5 is mostly safe, there ARE some ways of brute-forcing it faster than other hashes. If this isn't super-secret stuff, then I wouldn't worry about it, but if there's any chance that you'll undergo an extremely strict audit or be under the supervision of anyone paranoid, then you might as well use SHA-1.

Regarding email matching, the pattern I gave you should be valid for any major email character, so:

$isValidEmail = preg_match("/[a-zA-Z0-9@_\.\-]+/",$emailAddressToBeChecked);

That should "validate" about 90% of your addresses. It doesn't validate the structure, so it would also pass addresses like .@@@abc1, which is clearly not valid. You can get a little more traditional with something like:

$isValidEmail = preg_match("/[a-zA-Z0-9_\.\-]+@[a-zA-Z0-9_\.\-]+/",$emailAddressToBeChecked);

That will ensure that there's an @ symbol between characters, but it'll still work for email addresses like bob@bob (which is technically valid).

That said, if someone is going to go out of their way to type a phony address in, there's not much you can do in terms of validation. You could always try to do MX lookups to check the actual DNS records and such, but that, too, will only get you so far. I would go with the second piece of code and simply acknowledge that you will have people that won't want to give their real address and will provide a phony one that matches the validation expression.

GVNPublic123

ASKER

well once I choose crypt method its over, 650 people will have passwords irreversibly encrypted...and that would be a no-go to restore ever, so I gotta go with reliable system.

I've read that both MD5 and SHA1 are not so good as bruteforce attacks can be done fast on them. Any suggestions?

Terry Woods

Brute force attacks can only be done quickly if you allow them to be. For example, you could allow only 3 login attempts every 5 minutes from each IP address, or for each login id (preferably both). You could also set up a warning notification where you get emailed if there's an unusually high number of failed logins within a particular time.... etc.

GVNPublic123

ASKER

yeh but if someone hacks mysql server and gets database is the case....

Terry Woods

A correction to gr8gonzo's code:

$username = $_POST["username"];
$username = preg_replace("/[^a-zA-Z0-9@_\.\-]/","",$username);

Prior to adding the ^ character, it strips out all the wanted characters, not the unwanted ones.

Terry Woods

A 2nd correction:

$isValidEmail = preg_match("/[a-zA-Z0-9@_\.\-]+/",$emailAddressToBeChecked);

Needs to be:

$isValidEmail = preg_match("/^[a-zA-Z0-9@_\.\-]+$/",$emailAddressToBeChecked);

Without the ^ and $ characters, a match will be found if just one valid character is found (and there could be any number of invalid ones).

Terry Woods

So this also:
$isValidEmail = preg_match("/[a-zA-Z0-9_\.\-]+@[a-zA-Z0-9_\.\-]+/",$emailAddressToBeChecked);

Should be:
$isValidEmail = preg_match("/^[a-zA-Z0-9_\.\-]+@[a-zA-Z0-9_\.\-]+$/",$emailAddressToBeChecked);

GVNPublic123

ASKER

ok, is using hash('sha256', $saltedpass) any better? Is it better than MD5 or SHA1 in terms of speed vs security or it just doesn't matter? If SHA256 slows down bruteforces significantly than it wouldn't be worth to bruteforce anymore in first place as my site is no big deal.

ASKER CERTIFIED SOLUTION

gr8gonzo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

gr8gonzo

@GVN - the speed difference between MD5 and SHA1 doesn't really matter. The brute-force bottleneck will be the network. You just can't try thousands and thousands of combinations every second over a web server connection. Again, the internal one-way hashing is more about protecting your data from internal threats than anything, IMHO.

GVNPublic123

ASKER

so what should I do to encrypt passwords? What do you suggest? MD5, SHA1 or SHA256 are what I've been looking over...

Also, hash vs hash_hmac? Or it doesnt matter @ all and I can do simple salted md5 and be happy? Its certainly better than poorly encrypted as I have now.

gr8gonzo

If you're protecting any sort of financial data, be paranoid about everything. Use the strong security option when you can - a form of SHA, and use hash_hmac if it's available to you. Also make sure you encrypt anything sensitive like credit card numbers. Always pretend or assume that you WILL have an employee that will try to use your data to his/her advantage, so encrypt whatever pieces of information they could abuse. If you need to be able to decrypt the data, then use an SSL certificate to encrypt the data before storing it (you don't need to buy an expensive certificate for internal use - you can generate and use your own free certificate, referred to as a "self-signed certificate").

You won't be able to completely eliminate all potential venues of attack from an internal source, but the point is to slow them down as much as possible.

GVNPublic123

ASKER

no, I only have basic members website and importance of user accounts is not very high as hacking their accounts cannot make much damage.

gr8gonzo

Then md5 should probably be fine for your purposes. Keep in mind that most hackers are after the administrator account, not the other people's accounts, so just think about what a successful break-in to the administrator could mean in terms of any ripple effects. Would they be able to get into other systems or have any special privileges that could lead to a different, more damaging break-in? You don't have to answer these questions here - it's just for your own contemplation.

GVNPublic123

ASKER

I made up my mind, I will go with salted md5 cuz its only 32 chars hash and wont take much space in database and besides its just members site, noone gives a damn...

GVNPublic123

ASKER

besides, I have registration/login queries now using real_escape_string and stuff so...sql injection isn't likely I guess.

innotionent

To be secure most government entities use a password of greater than 8 characters.
Those characters must include at least 1 upper case character, one number and one non letter or non numerical character.
For example: Myp4ssw*

For extra security use a salt with your md5. so even if they guess the password they still have to guess the salt.