Solved

Best way to extract href value and anchor text from link?

Posted on 2014-07-21
10
2,400 Views
Last Modified: 2014-07-23
Hi,

What would be the best method to extract the href value and anchor text from link?

Link href's could coded be pretty differently like:
href = " "
href=" "
href = ' '
href=' '
HREF = " "
href=
HREF = http://...

And anchor texts could contain images for example.

So how to extract both best? Images should be included in anchor if present.

Thanks!
0
Comment
Question by:peps03
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 40209343
Some example strings might be helpful - if it is a string containing just an anchor link then

<?php
$string='<a href="somelink.php">some link</a>';
$anchor=new SimpleXMLElement($string);

echo $anchor['href']."<br>";
echo $anchor[0];

Open in new window

0
 
LVL 34

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 40209352
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209365
If there is other stuff in the string or multiple links then use the DOMDocument

<?php
$string='sdfsdfsd<a href="somelink.php">some link</a>sdfsdfsdf<a href="anotherlink.php">another link</a>';

$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo $node->nodeValue.'<br>'.$node->getAttribute("href")."<br>";
}

Open in new window

0
 

Author Comment

by:peps03
ID: 40209373
Thanks for your reply.
links like:

<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<a HREF = http://www.abc.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<A Target="_blank" HREF=http://www.abc.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
0
 

Author Comment

by:peps03
ID: 40209411
@gr8gonzo how would i get the href value and anchor text separately?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:peps03
ID: 40209457
@gr8gonzo Great library btw!

How can i check if a link contains an image?

And how could i get the image alt text if a link contains one?
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209486
<?php
$string='<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>';

$dom = new DOMDocument;
$dom->loadHTML($string); 

foreach ($dom->getElementsByTagName('a') as $node)
{ 
	$innertags = ""; 
	$children  = $node->childNodes;

	foreach ($children as $child) 
	{ 
		$innertags .= $node->ownerDocument->saveHTML($child);
	}
	echo $innertags;
	echo $node->getAttribute("href");
}

Open in new window

0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 500 total points
ID: 40209528
To get the text of the link, you'd access the innerText property of the link object.

To see if there's an image inside, you could check the innerHTML property and look for "<img", or you could do another find on the link for img tags. The below should produce the following results:

Results:
HREF=http://www.abc.com, TEXT=Test one, No images
HREF=http://www.def.com, TEXT=Test two, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.ghi.com, TEXT=Test three, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.jkl.com, TEXT=, IMG #1 SRC=http://www.abc.com/kuit.jpg, ALT=  bla bla bla bla bla

Open in new window


Code:
<?php

$string_html = <<<EOSTRING
<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  Test one   </a>

or

<a HREF = "http://www.def.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test two   </a>

or 

<a HREF = http://www.ghi.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test three   </a>

or 

<A Target="_blank" HREF=http://www.jkl.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
EOSTRING;

require("simple_html_dom.php");
$dom = str_get_html($string_html);

$links = $dom->find("a");
foreach($links as $link)
{
   $href = trim($link->href);
   $innertext = trim($link->plaintext);
   
   echo "HREF=" . $href . ", TEXT=" . $innertext;
   
    // See if the link contains an image or text
    $linkimages = $link->find("img");
    if(count($linkimages))
    {
    	// Has at least one image insid ethe link
    	foreach($linkimages as $idx => $linkimage)
    	{
    		echo ", IMG #".($idx+1)." SRC=".$linkimage->src.", ALT=".$linkimage->alt;
    	}
    }
    else
    {
    	echo ", No images";
    }
    echo "\n";
}

Open in new window

0
 
LVL 15

Expert Comment

by:Insoftservice
ID: 40214070
$html = '<a href="http://www.lovetomarry.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

for img
<?php

$html = '<img src="http://path.to/img" id="randomid" />';

if (preg_match('/<img.+?src(?: )*=(?: )*[\'"](.*?)[\'"]/si', $html, $arrResult)) {
    echo $arrResult[1];  // Should display http://path.to/img
} else {
    echo "No match found";
}

?>
0
 

Author Closing Comment

by:peps03
ID: 40214265
Thanks! The simplehtmldom library works like a charm!
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now