Solved

Best way to extract href value and anchor text from link?

Posted on 2014-07-21
10
2,500 Views
Last Modified: 2014-07-23
Hi,

What would be the best method to extract the href value and anchor text from link?

Link href's could coded be pretty differently like:
href = " "
href=" "
href = ' '
href=' '
HREF = " "
href=
HREF = http://...

And anchor texts could contain images for example.

So how to extract both best? Images should be included in anchor if present.

Thanks!
0
Comment
Question by:peps03
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 40209343
Some example strings might be helpful - if it is a string containing just an anchor link then

<?php
$string='<a href="somelink.php">some link</a>';
$anchor=new SimpleXMLElement($string);

echo $anchor['href']."<br>";
echo $anchor[0];

Open in new window

0
 
LVL 34

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 40209352
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209365
If there is other stuff in the string or multiple links then use the DOMDocument

<?php
$string='sdfsdfsd<a href="somelink.php">some link</a>sdfsdfsdf<a href="anotherlink.php">another link</a>';

$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo $node->nodeValue.'<br>'.$node->getAttribute("href")."<br>";
}

Open in new window

0
 

Author Comment

by:peps03
ID: 40209373
Thanks for your reply.
links like:

<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<a HREF = http://www.abc.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<A Target="_blank" HREF=http://www.abc.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
0
 

Author Comment

by:peps03
ID: 40209411
@gr8gonzo how would i get the href value and anchor text separately?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:peps03
ID: 40209457
@gr8gonzo Great library btw!

How can i check if a link contains an image?

And how could i get the image alt text if a link contains one?
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209486
<?php
$string='<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>';

$dom = new DOMDocument;
$dom->loadHTML($string); 

foreach ($dom->getElementsByTagName('a') as $node)
{ 
	$innertags = ""; 
	$children  = $node->childNodes;

	foreach ($children as $child) 
	{ 
		$innertags .= $node->ownerDocument->saveHTML($child);
	}
	echo $innertags;
	echo $node->getAttribute("href");
}

Open in new window

0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 500 total points
ID: 40209528
To get the text of the link, you'd access the innerText property of the link object.

To see if there's an image inside, you could check the innerHTML property and look for "<img", or you could do another find on the link for img tags. The below should produce the following results:

Results:
HREF=http://www.abc.com, TEXT=Test one, No images
HREF=http://www.def.com, TEXT=Test two, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.ghi.com, TEXT=Test three, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.jkl.com, TEXT=, IMG #1 SRC=http://www.abc.com/kuit.jpg, ALT=  bla bla bla bla bla

Open in new window


Code:
<?php

$string_html = <<<EOSTRING
<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  Test one   </a>

or

<a HREF = "http://www.def.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test two   </a>

or 

<a HREF = http://www.ghi.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test three   </a>

or 

<A Target="_blank" HREF=http://www.jkl.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
EOSTRING;

require("simple_html_dom.php");
$dom = str_get_html($string_html);

$links = $dom->find("a");
foreach($links as $link)
{
   $href = trim($link->href);
   $innertext = trim($link->plaintext);
   
   echo "HREF=" . $href . ", TEXT=" . $innertext;
   
    // See if the link contains an image or text
    $linkimages = $link->find("img");
    if(count($linkimages))
    {
    	// Has at least one image insid ethe link
    	foreach($linkimages as $idx => $linkimage)
    	{
    		echo ", IMG #".($idx+1)." SRC=".$linkimage->src.", ALT=".$linkimage->alt;
    	}
    }
    else
    {
    	echo ", No images";
    }
    echo "\n";
}

Open in new window

0
 
LVL 15

Expert Comment

by:Insoftservice
ID: 40214070
$html = '<a href="http://www.lovetomarry.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

for img
<?php

$html = '<img src="http://path.to/img" id="randomid" />';

if (preg_match('/<img.+?src(?: )*=(?: )*[\'"](.*?)[\'"]/si', $html, $arrResult)) {
    echo $arrResult[1];  // Should display http://path.to/img
} else {
    echo "No match found";
}

?>
0
 

Author Closing Comment

by:peps03
ID: 40214265
Thanks! The simplehtmldom library works like a charm!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

912 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now