Solved

REGEX Help for Domain Name

Posted on 2014-03-06
3
384 Views
Last Modified: 2014-03-07
Hello,

I am sifting through a page of text and want to use preg_match on it to find any Twitter links and capture them.

What is the regex I would use to find "http://twitter.com/anyuser" with "anyuser" being any string? The "http" should also look for "https"



Thank you!
0
Comment
Question by:EffinGood
3 Comments
 
LVL 6

Assisted Solution

by:Tony O'Byrne
Tony O'Byrne earned 400 total points
ID: 39911469
For the most part, the regex should be relatively straight-forward...

I'll break it up into two parts - the protocol and domain (http://twitter.com/ or https://www.twitter.com/), and the "anyuser" part...

https?://(?:www\.)?twitter\.com/

That matches the following:
http://twitter.com/
https://twitter.com/
http://www.twitter.com/
https://www.twitter.com/

If you need to escape the forward slashes, just add the backslash in front of each:
https?:\/\/(?:www\.)?twitter\.com\/

A note on the (?:www\.) part -
(?:) is a "non-capturing group".  Because the parenthesis capture a match for a backreference, it's handy to use the (?:) if you don't intend to backreference it.  However, if you don't care about backreferences at all, then you can remove the "?:" part (but leave the parenthesis.

On the "anyuser" part...

I'm treating this separately because I'm not entirely sure what characters are allowed in a twitter username.

A good place to start is:
[a-zA-Z]+

This would match all alpha characters as long as there are one or more.  However, twitter probably allows numbers, too:
[a-zA-Z0-9]+

... and underscores?
[a-zA-Z0-9_]+

... and hyphens?
[a-zA-Z0-9_-]+

So the entire regex so far is:
https?://(?:www\.)?twitter\.com/[a-zA-Z0-9_-]+

This is a pretty good starting-point.  If there are more characters allowed in twitter usernames, just add them before the ']'.  Be careful, though...  Some characters such as '?' and '#' mean something in the URL query-string, so they are probably not valid in usernames (though I don't know that for a fact.)

Hope this helps!

All the best,
Tony.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 100 total points
ID: 39912285
Please see this article.  It shows the thought process used to develop the REGEX to find a domain name.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

A Google search for PHP regular expression will turn up a lot of good learning resources.  This one is particularly on point:
http://www.php.net/manual/en/reference.pcre.pattern.syntax.php

Also, plan on giving yourself plenty of time to learn and experiment (as shown in the Test-Driven-Development article).  You're wading into an amazingly complex backwater of computer science, where the entire language is written in punctuation!
https://xkcd.com/208/
http://xkcd.com/1171/

If you decide you want a simpler approach, please post an example of the source document and I'll be glad to show you how to parse it with simple PHP statements.
0
 

Author Closing Comment

by:EffinGood
ID: 39913295
Thanks guys!

Tony's worked, and Ray, your article is awesome.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now