php validate URL

Hi E´s,
I need to validate a URL in PHP.
I search in Internet for try get solution for the problem and I don't see any good script that work perfect.
The most perfect that I found was this code:
<?php
$url = "http://www.example.com/index2.php";
        if (!preg_match("#^http://www\.[a-z0-9-_.]+\.[a-z]{2,4}$#i",$url)) {
        echo "wrong url";
        } else {
        echo "ok";
        }
?> 

Open in new window

The code above work partial fine for the domain names, like http://www.example.com validate, but http://example.com (without www) not validate!
Also not validate for this kind of URL's:
http://www.example.com/index.php
http://www.example.com/friendlyurl/

Any idea to improve the regular expression or other way to validate the URL?

The best regards, JC
LVL 3
Pedro ChagasWebmasterAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
Why not just try to read from the URL?  If you don't get a 200 OK response, it's not a valid URL.

If you want to see how professionals would approach this problem, please read this article.  It shows how to deconstruct the problem, think about the solutions and create a test plan that will allow the greatest chance of rapid and dependable success.
http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
0
GaryCommented:
This seems to work didn't do much testing;

<?php

function checkurl($url){

return(preg_match("#^(https?://)?[^/.]+(\.[^/.]+)+/?$#i",$url));
}

echo "http://www.example.com: " . checkurl("http://www.example.com")."<br>";
echo "http://www.example.com/index.php: " . checkurl("http://www.example.com/index.php")."<br>";
echo "http://www.example.com/friendlyurl/: " . checkurl("http://www.example.com/friendlyurl/")."<br>";
echo "http://example.com: " . checkurl("http://example.com")."<br>";

Open in new window

0
Pedro ChagasWebmasterAuthor Commented:
Hi @Gary,
I increase this line in your code:
echo "http://example: " . checkurl("http://example.com")."<br>";
and the output is:
http://www.example.com: 1
http://www.example.com/index.php: 0
http://www.example.com/friendlyurl/: 0
http://example.com: 1
http://example: 1
The number 5 should be "0", and 2 and 3 "1".
Can you improve the RE?

Hi @Ray: I will read the article!

~JC
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

GaryCommented:
Yes but the url you are passing has .com - if you remove that it doesn't pass.
0
GaryCommented:
Maybe I misunderstood I thought you only wanted the domain and nothing else.
0
GaryCommented:
Changed the pattern, allowed bad characters in the first attempt, who'd thought a url would so hard to validate.

return(preg_match("#^(https?://)?([\da-zA-Z\.-]+)\.([a-z\.]{2,6})([\da-zA-Z/\.-]*)*/?$#i",$url));
0
Pedro ChagasWebmasterAuthor Commented:
Hi @Garry,
Is possible you improve your solution for check also GET variables in the URL, like this one:
http://example.com/some.php?hhh=10&dddd=20: 0
In line above the return is "0", not validate.

Thanks.

~JC
0
GaryCommented:
return(preg_match("#^(https?://)?([\da-zA-Z_\.-]+)\.([a-z\.])([\da-zA-Z/\.-\\?]*)*/?$#i",$url));


There's a couple of proviso's after double checking what is and isn't allowed
Underscores are allowed in the host name - this isn't accounted for and isn't likely a problem anyway - I've yet to see someone use one in an hostname.
The domain extension is a b*tch to validate as there is so many variations that it would make the regex pretty complex to make sure it is correct (if even possible)

Also where I have {2,6} - it is wrong, I completely forgot about all the new extensions like .photography til just now - I think it's probably better to remove this
0
GaryCommented:
Check that

return(preg_match("#^(https?://)?([\da-zA-Z_\.-]+)\.([a-z\.])([\d\w/\.=\\?]*)*/?$#i",$url));
0
Terry WoodsIT GuruCommented:
Some minor changes to Gary's latest pattern. There's no need for the /? at the end, and the . characters between the [] brackets don't need escaping. The * after the last group is also redundant.
 return(preg_match("#^(https?://)?([\da-zA-Z_.-]+)\.([a-z.])([\d\w/.=\\?]*)$#i",$url)); 

Open in new window

No points thanks...
0
Pedro ChagasWebmasterAuthor Commented:
I forget that kind of domains exist. now domains can be lot's of things.
For example I test this URL based in new or future domains:
http://example.games/jjj.php?uu=kjkjk
and validated and well.
But for example:
http://example.c/jjj.php?uu=kjkjk

Open in new window

, the domain is ".c", and the script validate, do not know if it good or bad. It is possible there are domains with only one character?
If not, can you please improve the RE, for not accept domains with one character?

~JC
0
GaryCommented:
Add it back in but use {2}
There is no extensions less than 2 characters
0
GaryCommented:
So using Terry's corrections

 return(preg_match("#^(https?://)?([\da-zA-Z_.-]+)\.([a-z.]{2})([\d\w/.=\\?]*)$#i",$url));
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GaryCommented:
I think anything more complicated than this then you are just blowing in the wind - there are too many variables in play to be certain of 100% validation plus I'm heading to the bar!
0
Ray PaseurCommented:
... plus I'm heading to the bar!
Yes.  Whenever the problem can only be solved with REGEX the truth is to be found here. http://xkcd.com/1171/
0
Pedro ChagasWebmasterAuthor Commented:
Based on the idea of @Ray, other way to check URL:
<?
$file = 'http://stackoverflow.com/questions/2280394/how-can-i-check-if-a-url-exists-via-php';
$file_headers = @get_headers($file);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
    echo "não encontrado";
}
else {  
    echo "encontrado";
}
?>

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.