Solved

Best way to extract href value and anchor text from link?

Posted on 2014-07-21
10
2,697 Views
Last Modified: 2014-07-23
Hi,

What would be the best method to extract the href value and anchor text from link?

Link href's could coded be pretty differently like:
href = " "
href=" "
href = ' '
href=' '
HREF = " "
href=
HREF = http://...

And anchor texts could contain images for example.

So how to extract both best? Images should be included in anchor if present.

Thanks!
0
Comment
Question by:peps03
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 40209343
Some example strings might be helpful - if it is a string containing just an anchor link then

<?php
$string='<a href="somelink.php">some link</a>';
$anchor=new SimpleXMLElement($string);

echo $anchor['href']."<br>";
echo $anchor[0];

Open in new window

0
 
LVL 35

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 40209352
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209365
If there is other stuff in the string or multiple links then use the DOMDocument

<?php
$string='sdfsdfsd<a href="somelink.php">some link</a>sdfsdfsdf<a href="anotherlink.php">another link</a>';

$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo $node->nodeValue.'<br>'.$node->getAttribute("href")."<br>";
}

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:peps03
ID: 40209373
Thanks for your reply.
links like:

<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<a HREF = http://www.abc.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<A Target="_blank" HREF=http://www.abc.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
0
 

Author Comment

by:peps03
ID: 40209411
@gr8gonzo how would i get the href value and anchor text separately?
0
 

Author Comment

by:peps03
ID: 40209457
@gr8gonzo Great library btw!

How can i check if a link contains an image?

And how could i get the image alt text if a link contains one?
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209486
<?php
$string='<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>';

$dom = new DOMDocument;
$dom->loadHTML($string); 

foreach ($dom->getElementsByTagName('a') as $node)
{ 
	$innertags = ""; 
	$children  = $node->childNodes;

	foreach ($children as $child) 
	{ 
		$innertags .= $node->ownerDocument->saveHTML($child);
	}
	echo $innertags;
	echo $node->getAttribute("href");
}

Open in new window

0
 
LVL 35

Assisted Solution

by:gr8gonzo
gr8gonzo earned 500 total points
ID: 40209528
To get the text of the link, you'd access the innerText property of the link object.

To see if there's an image inside, you could check the innerHTML property and look for "<img", or you could do another find on the link for img tags. The below should produce the following results:

Results:
HREF=http://www.abc.com, TEXT=Test one, No images
HREF=http://www.def.com, TEXT=Test two, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.ghi.com, TEXT=Test three, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.jkl.com, TEXT=, IMG #1 SRC=http://www.abc.com/kuit.jpg, ALT=  bla bla bla bla bla

Open in new window


Code:
<?php

$string_html = <<<EOSTRING
<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  Test one   </a>

or

<a HREF = "http://www.def.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test two   </a>

or 

<a HREF = http://www.ghi.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test three   </a>

or 

<A Target="_blank" HREF=http://www.jkl.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
EOSTRING;

require("simple_html_dom.php");
$dom = str_get_html($string_html);

$links = $dom->find("a");
foreach($links as $link)
{
   $href = trim($link->href);
   $innertext = trim($link->plaintext);
   
   echo "HREF=" . $href . ", TEXT=" . $innertext;
   
    // See if the link contains an image or text
    $linkimages = $link->find("img");
    if(count($linkimages))
    {
    	// Has at least one image insid ethe link
    	foreach($linkimages as $idx => $linkimage)
    	{
    		echo ", IMG #".($idx+1)." SRC=".$linkimage->src.", ALT=".$linkimage->alt;
    	}
    }
    else
    {
    	echo ", No images";
    }
    echo "\n";
}

Open in new window

0
 
LVL 15

Expert Comment

by:Insoftservice
ID: 40214070
$html = '<a href="http://www.lovetomarry.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

for img
<?php

$html = '<img src="http://path.to/img" id="randomid" />';

if (preg_match('/<img.+?src(?: )*=(?: )*[\'"](.*?)[\'"]/si', $html, $arrResult)) {
    echo $arrResult[1];  // Should display http://path.to/img
} else {
    echo "No match found";
}

?>
0
 

Author Closing Comment

by:peps03
ID: 40214265
Thanks! The simplehtmldom library works like a charm!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How Close unsubmited attempts 10 42
How to use session variables in php? 22 42
Regular Expression needed 4 13
if statement malfunction 5 16
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question