Link to home
Start Free TrialLog in
Avatar of Nathan Riley
Nathan RileyFlag for United States of America

asked on

Search for URL in string php

I'm trying to clean some data when users input it and I need to be able to look for URL's and grab them.

So for example:

This is an awesome post!  Check it out: http://google.com and or https://google.com.

Open in new window


Ok so I have that post.  Say in a php variable.  How do I look for http:// or https:// and then grab the full URL in PHP?
Avatar of James Bilous
James Bilous
Flag of United States of America image

You'll want to use REGEX with preg_match to extract the desired substring from a string in a variable. See:

http://www.regexr.com/3bqqh
http://php.net/manual/en/function.preg-match.php
This is an interesting question and has been with us for many years, if not decades.  Enormous volumes have been written about this question.  I even used it as an example in an E-E article to illustrate the process of test-driven development, back in the day before automated testing "grew up."

The quality of the results in an application like this is highly dependent on the detailed problem definition, and the quality of your test data.  String parsing with regular expressions can be dicey!  The sort of questions we need to consider include "Must the protocol always be HTTP or HTTPS?"  Or "Can we include FTP, too?"  Or "What if it says 'www' but has no leading protocol?"  Or "What TLDs, besides '.com', must I locate?"  In practice you will probably come up with more questions than answers!  Eventually you will get to a regular expression that is "good enough" but that may not cover 100% of the edge and corner cases.

Here's an article that describes the thought process and the way we write the programming:
https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html

Here's an example that uses your test data.  It contains comments to explain how the regular expression works:
https://iconoun.com/demo/temp_nathan_riley.php
<?php // demo/temp_nathan_riley.php
/**
 * https://www.experts-exchange.com/questions/28959518/Search-for-URL-in-string-php.html
 *
 * https://www.experts-exchange.com/articles/7830/A-Quick-Tour-of-Test-Driven-Development.html
 */
error_reporting(E_ALL);


// TEST DATA FROM THE POST AT E-E
$str = 'This is an awesome post!  Check it out: http://google.com and or https://google.com.';

// A REGEX THAT FINDS URLS AND DOMAIN SUBSTRINGS
$rgx
= '#'         // REGEX DELIMITER

. '\b'        // ON WORD BOUNDARY

. '('         // START GROUP
. 'https?'    // HTTP OR HTTPS
. '|'         // OR
. 'ftps?'     // FTP OR FTPS
. ')'         // END GROUP
. '??'        // ZERO OR ONE OF THIS GROUP, UNGREEDY

. '('         // START GROUP
. '://'       // COLON, SLASH, SLASH
. ')'         // END GROUP
. '??'        // ZERO OR ONE OF THIS GROUP, UNGREEDY

. '('         // START GROUP
. '[A-Z0-9]'  // A SUBDOMAIN
. '+?'        // INDETERMINATE LENGTH
. '\.'        // A DOT (ESCAPED)
. ')'         // END GROUP
. '??'        // ZERO OR ONE OF THIS GROUP, UNGREEDY

. '('         // START GROUP
. '[A-Z0-9]'  // CHARACTER CLASS ALPHANUMERIC
. '+?'        // INDETERMINATE LENGTH
. ')'         // END GROUP

. '('         // START GROUP
. '[.]'       // THE DOT (BEFORE THE TLD)
. '{1}'       // LENGTH IS EXACTLY ONE
. ')'         // END GROUP

. '('         // START GROUP
. '[A-Z]'     // CHARACTER CLASS ALPHA
. '{2,7}'     // LENGTH IS TWO TO SEVEN
. ')'         // END GROUP

. '\b'        // ON WORD BOUNDARY

. '#'         // REGEX DELIMITER
. 'i'         // CASE-INSENSITIVE
;

// LOCATE THE URLS
preg_match_all($rgx, $str, $mat);

// SHOW THE WORK PRODUCT
print_r($mat[0]);

// ACTIVATE THIS TO SEE ALL OF THE URL PIECES
// print_r($mat);

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial