Best way to extract href value and anchor text from link?

Hi,

What would be the best method to extract the href value and anchor text from link?

Link href's could coded be pretty differently like:
href = " "
href=" "
href = ' '
href=' '
HREF = " "
href=
HREF = http://...

And anchor texts could contain images for example.

So how to extract both best? Images should be included in anchor if present.

Thanks!
peps03Asked:
Who is Participating?
 
GaryCommented:
Some example strings might be helpful - if it is a string containing just an anchor link then

<?php
$string='<a href="somelink.php">some link</a>';
$anchor=new SimpleXMLElement($string);

echo $anchor['href']."<br>";
echo $anchor[0];

Open in new window

0
 
GaryCommented:
If there is other stuff in the string or multiple links then use the DOMDocument

<?php
$string='sdfsdfsd<a href="somelink.php">some link</a>sdfsdfsdf<a href="anotherlink.php">another link</a>';

$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo $node->nodeValue.'<br>'.$node->getAttribute("href")."<br>";
}

Open in new window

0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
peps03Author Commented:
Thanks for your reply.
links like:

<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<a HREF = http://www.abc.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<A Target="_blank" HREF=http://www.abc.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
0
 
peps03Author Commented:
@gr8gonzo how would i get the href value and anchor text separately?
0
 
peps03Author Commented:
@gr8gonzo Great library btw!

How can i check if a link contains an image?

And how could i get the image alt text if a link contains one?
0
 
GaryCommented:
<?php
$string='<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>';

$dom = new DOMDocument;
$dom->loadHTML($string); 

foreach ($dom->getElementsByTagName('a') as $node)
{ 
	$innertags = ""; 
	$children  = $node->childNodes;

	foreach ($children as $child) 
	{ 
		$innertags .= $node->ownerDocument->saveHTML($child);
	}
	echo $innertags;
	echo $node->getAttribute("href");
}

Open in new window

0
 
gr8gonzoConsultantCommented:
To get the text of the link, you'd access the innerText property of the link object.

To see if there's an image inside, you could check the innerHTML property and look for "<img", or you could do another find on the link for img tags. The below should produce the following results:

Results:
HREF=http://www.abc.com, TEXT=Test one, No images
HREF=http://www.def.com, TEXT=Test two, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.ghi.com, TEXT=Test three, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.jkl.com, TEXT=, IMG #1 SRC=http://www.abc.com/kuit.jpg, ALT=  bla bla bla bla bla

Open in new window


Code:
<?php

$string_html = <<<EOSTRING
<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  Test one   </a>

or

<a HREF = "http://www.def.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test two   </a>

or 

<a HREF = http://www.ghi.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test three   </a>

or 

<A Target="_blank" HREF=http://www.jkl.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
EOSTRING;

require("simple_html_dom.php");
$dom = str_get_html($string_html);

$links = $dom->find("a");
foreach($links as $link)
{
   $href = trim($link->href);
   $innertext = trim($link->plaintext);
   
   echo "HREF=" . $href . ", TEXT=" . $innertext;
   
    // See if the link contains an image or text
    $linkimages = $link->find("img");
    if(count($linkimages))
    {
    	// Has at least one image insid ethe link
    	foreach($linkimages as $idx => $linkimage)
    	{
    		echo ", IMG #".($idx+1)." SRC=".$linkimage->src.", ALT=".$linkimage->alt;
    	}
    }
    else
    {
    	echo ", No images";
    }
    echo "\n";
}

Open in new window

0
 
InsoftserviceCommented:
$html = '<a href="http://www.lovetomarry.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

for img
<?php

$html = '<img src="http://path.to/img" id="randomid" />';

if (preg_match('/<img.+?src(?: )*=(?: )*[\'"](.*?)[\'"]/si', $html, $arrResult)) {
    echo $arrResult[1];  // Should display http://path.to/img
} else {
    echo "No match found";
}

?>
0
 
peps03Author Commented:
Thanks! The simplehtmldom library works like a charm!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.