• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 616
  • Last Modified:

Regex to match linked image as first element

I need to match content that starts with a linked image, but am having some trouble with the pattern.  I also need to extract the pieces of the content.

Here's what I have:
/(\s*)(<a(.*)><img (.*)<\/a>)(\s*)(.*?)/sU

This works, however, it will not ONLY match if the content starts with a linked images, which is what I need.  This will match for ANYTHING can come after <a as long as there is a > ANYWHERE after it on the content to match.  But, I only want a match if the content starts with a linked image.

Example:

1.  <a href="someurl"><img src="someurl"></a> is the coolest image. TRUE
2.  The coolest image is <a href="someurl"><img src="someurl"></a>  FALSE
3.  <a href="someurl">This is the cooleset image</a><img src="someurl"> FALSE

Any help with the regex pattern is appreciated.  Thanks.
0
golfDoctor
Asked:
golfDoctor
  • 8
  • 5
  • 4
1 Solution
 
Ray PaseurCommented:
<?php // RAY_temp_golfdoctor.php
error_reporting(E_ALL);
echo '<pre>';

/* FROM THE POST AT EE
1.  <a href="someurl"><img src="someurl"></a> is the coolest image. TRUE
2.  The coolest image is <a href="someurl"><img src="someurl"></a>  FALSE
3.  <a href="someurl">This is the cooleset image</a><img src="someurl"> FALSE
*/ // END OF EXAMPLES

// THE TEST DATA
$arr = array
( '<a href="someurl"><img src="someurl"></a> is the coolest image. TRUE'
, 'The coolest image is <a href="someurl"><img src="someurl"></a>  FALSE'
, '<a href="someurl">This is the cooleset image</a><img src="someurl"> FALSE'
)
;


// TEST THE FUNCTION
foreach ($arr as $str)
{
    if (starting_image_link($str)) echo PHP_EOL . htmlentities($str);
}


// STARTS WITH IMAGE LINK
function starting_image_link($str)
{
    // A REGULAR EXPRESSION TO FIND THE STARTING ANCHOR
    $rx0
    = '#'      // REGEX DELIMITER
    . '^'      // AT START OF STRING
    . '\<'     // ESCAPED WICKET
    . 'a '     // ANCHOR TAG
    . '#'      // REGEX DELIMITER
    . 'i'      // CASE-INSENSITIVE
    ;

    // A REGULAR EXPRESSION TO FIND THE STARTING IMAGE
    $rx1
    = '#'      // REGEX DELIMITER
    . '^'      // AT START OF STRING
    . '\<'     // ESCAPED WICKET
    . 'img '   // IMAGE TAG
    . '#'      // REGEX DELIMITER
    . 'i'      // CASE-INSENSITIVE
    ;

    // IF IT STARTS WITH ANCHOR TAG
    if (preg_match($rx0, $str))
    {
        // REMOVE ALL BUT IMAGE
        $new = strip_tags($str, '<img>');

        // IF THE NEW STRING STARTS WITH IMAGE
        if (preg_match($rx1, $new)) return TRUE;
    }
}

Open in new window

See: http://www.laprbass.com/RAY_temp_golfdoctor.php
0
 
golfDoctorAuthor Commented:
Perhaps I wasn't clear.  I need to match if the first content is a linked image, as well as extract information from the content if so ... NOT just return true or false.  See the initial pattern I posted, for the areas to extract.  I would think there's a simple pattern to do this, I just don't know it.
0
 
Ray PaseurCommented:
Extract what information from the content?  Do you want the image URL?  let me know what you want and I'm sure we can extract it.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
Julian HansenCommented:
I am not sure if your examples cover all eventualities but this works for the examples given
^<a href="(.*)"[.]*><img (.*)></a>(.*)$

Open in new window

Sample
<?php 
$x1 = '<a href="someurl" tart="_blank"><img src="someurl"></a> is the coolest image.';
$x2 = 'The coolest image is <a href="someurl"><img src="someurl"></a>';
$x3 = '<a href="someurl">This is the cooleset image</a><img src="someurl">';

$pattern = '/^\<a href\="(.*)"[.]*\>\<img (.*)\>\<\/a\>(.*)$/';

echo preg_match($pattern, $x1) . "<br/>";
echo preg_match($pattern, $x2) . "<br/>";
echo preg_match($pattern, $x3) . "<br/>";
?>

Open in new window

0
 
golfDoctorAuthor Commented:
Testing that pattern, ^<a href="(.*)"[.]*><img (.*)></a>(.*)$,  here results in error:
http://www.spaweditor.com/scripts/regex/index.php
0
 
golfDoctorAuthor Commented:
Sorry guys, I figured it out with a small change to my original pattern
/(\s*)(<a(.*)>\s*<img (.*)<\/a>)(\s*)(.*?)/sU
0
 
Julian HansenCommented:
I believe #a38332435 answered the question.

Your claim that it yieled errors was not correct. Tested at http://www.spaweditor.com/scripts/regex/index.php with exact strings from my posted and it returned the required result.
0
 
Ray PaseurCommented:
I agree with julianH.  I tested his solution and it identifies the string correctly.  However you never told us what you wanted to get out of the string.  I also need to extract the pieces of the content is not very illuminating to someone who does not know your application needs.
0
 
golfDoctorAuthor Commented:
As shown in my original pattern, I need everything after the opening <a and <img tags, and everything after the closing </a> of the linked image.  The linked image must be the first thing in the content, or the match would be false.

julianH - your pattern requires <a href, but that format is not guaranteed.  It also doesn't accommodate white space or line breaks within the first linked image.

Example:  
1.  <a href="someurl">  <img src="someurl"></a> is the coolest image. > would be false, but should be true and extract the areas mentioned.

Open to your input.
Thanks.
0
 
golfDoctorAuthor Commented:
The pattern needs to work fro paragraphs of content, not just the simple examples I provided.  The paragraphs may have text links, and image links throughout, and line breaks, and spaces.  But, the pattern should on ly match if the linked image is the first part of the content, and there may be line breaks or spaces between the <a> and the <img>, as well as after the closing of the <img> and the closing</a>.

i don't have control over the content, so I can't ensure the spaces are removed from within the links.
0
 
Julian HansenCommented:
The pattern works for the samples posted but agree that the samples were too restrictive.
0
 
golfDoctorAuthor Commented:
Open to additional input, as my pattern is still falling short.  Examples are just that, examples.  The solution needs to be more robust, to handle any content.
0
 
golfDoctorAuthor Commented:
I think I may have worked out the spaces and breaks.  Stay tuned.
0
 
Ray PaseurCommented:
Suggest you post a new question with all the details about the output you want, and a complete set of test data.  We can probably help with regular expressions, but we can't create your test data.  Only you would know what you have for inputs to the process.  This article may help to explain why we would need that information from you.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
0
 
golfDoctorAuthor Commented:
I think what I have posted is pretty clear, and the patterns are pretty close.  

'The pattern needs to work fro paragraphs of content, not just the simple examples I provided.  The paragraphs may have text links, and image links throughout, and line breaks, and spaces.  But, the pattern should on ly match if the linked image is the first part of the content, and there may be line breaks or spaces between the <a> and the <img>, as well as after the closing of the <img> and the closing</a>.'

'As shown in my original pattern, I need everything after the opening <a and <img tags, and everything after the closing </a> of the linked image.  The linked image must be the first thing in the content, or the match would be false.'

The pattern needs to be pretty flexible with characters between the tags mentioned, as I have no control over the content.  Posted new examples won't help, as the data changes all the time.  The format is what matters, which I've outlined above.
0
 
Julian HansenCommented:
Ok how about this
^<a.*href="(.*)">\s*<img.*src="(.*)">[^<]*</a>(.*)

Open in new window

For inputs (seems to work)
<a href="someurl"><img src="someurl"></a> is the coolest image.
<a href="someurl"> <img src="someurl"></a> is the coolest image.
<a href="someurl">   	<img id="id" src="someurl"></a> is the coolest image.
<a id="fred" href="someurl"><img src="someurl"></a> is the coolest image.
<a id="fred" href="someurl"> <img src="someurl"></a> is the coolest image. <a href="someurl"> <img src="image"></a>
<a id="fred" href="someurl">  <img id="anotherid" src="someurl"></a> is the coolest image.
The coolest image is <a href="someurl"><img src="someurl"></a> 
<a href="someurl">This is the cooleset image</a><img src="someurl"> 

Open in new window

0
 
Julian HansenCommented:
Here is some code that tests it
<?php 
$input = array();

$input[] = '<a href="someurl"><img src="someurl"></a> is the coolest image.';
$input[] = '<a href="someurl"> <img src="someurl"></a> is the coolest image.';
$input[] = '<a href="someurl">   	<img id="id" src="someurl"></a> is the coolest image.';
$input[] = '<a id="fred" href="someurl"><img src="someurl"></a> is the coolest image.';
$input[] = '<a id="fred" href="someurl"> <img src="someurl"></a> is the coolest image. <a href="someurl"> <img src="image"></a>';
$input[] = '<a id="fred" href="someurl">  <img id="anotherid" src="someurl"></a> is the coolest image.';
$input[] = 'The coolest image is <a href="someurl"><img src="someurl"></a>';
$input[] = '<a href="someurl">This is the cooleset image</a><img src="someurl">';

$pattern = '/^\<a.*href="(.*)">\s*<img.*src="(.*)">[^\<]*<\/a\>\s*(.*)$/';
foreach($input as $i) {
	echo $i . "<br/>";
	$result = preg_match($pattern, $i, $matches);
	echo "\t\tResult: [$result]<br/>";
	echo "<pre>";
	print_r($matches);
	echo "</pre>";
}

?>

Open in new window

0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 8
  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now