Solved

Best way to extract href value and anchor text from link?

Posted on 2014-07-21
10
2,745 Views
Last Modified: 2014-07-23
Hi,

What would be the best method to extract the href value and anchor text from link?

Link href's could coded be pretty differently like:
href = " "
href=" "
href = ' '
href=' '
HREF = " "
href=
HREF = http://...

And anchor texts could contain images for example.

So how to extract both best? Images should be included in anchor if present.

Thanks!
0
Comment
Question by:peps03
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 40209343
Some example strings might be helpful - if it is a string containing just an anchor link then

<?php
$string='<a href="somelink.php">some link</a>';
$anchor=new SimpleXMLElement($string);

echo $anchor['href']."<br>";
echo $anchor[0];

Open in new window

0
 
LVL 35

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 40209352
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209365
If there is other stuff in the string or multiple links then use the DOMDocument

<?php
$string='sdfsdfsd<a href="somelink.php">some link</a>sdfsdfsdf<a href="anotherlink.php">another link</a>';

$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo $node->nodeValue.'<br>'.$node->getAttribute("href")."<br>";
}

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:peps03
ID: 40209373
Thanks for your reply.
links like:

<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<a HREF = http://www.abc.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>

or

<A Target="_blank" HREF=http://www.abc.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
0
 

Author Comment

by:peps03
ID: 40209411
@gr8gonzo how would i get the href value and anchor text separately?
0
 

Author Comment

by:peps03
ID: 40209457
@gr8gonzo Great library btw!

How can i check if a link contains an image?

And how could i get the image alt text if a link contains one?
0
 
LVL 58

Expert Comment

by:Gary
ID: 40209486
<?php
$string='<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test test test   </a>';

$dom = new DOMDocument;
$dom->loadHTML($string); 

foreach ($dom->getElementsByTagName('a') as $node)
{ 
	$innertags = ""; 
	$children  = $node->childNodes;

	foreach ($children as $child) 
	{ 
		$innertags .= $node->ownerDocument->saveHTML($child);
	}
	echo $innertags;
	echo $node->getAttribute("href");
}

Open in new window

0
 
LVL 35

Assisted Solution

by:gr8gonzo
gr8gonzo earned 500 total points
ID: 40209528
To get the text of the link, you'd access the innerText property of the link object.

To see if there's an image inside, you could check the innerHTML property and look for "<img", or you could do another find on the link for img tags. The below should produce the following results:

Results:
HREF=http://www.abc.com, TEXT=Test one, No images
HREF=http://www.def.com, TEXT=Test two, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.ghi.com, TEXT=Test three, IMG #1 SRC=http://www.abc.com/pic/1.jpeg, ALT=
HREF=http://www.jkl.com, TEXT=, IMG #1 SRC=http://www.abc.com/kuit.jpg, ALT=  bla bla bla bla bla

Open in new window


Code:
<?php

$string_html = <<<EOSTRING
<a HREF = "http://www.abc.com " onclick="return photoCaptionClick(this);">  Test one   </a>

or

<a HREF = "http://www.def.com " onclick="return photoCaptionClick(this);">  <img src="http://www.abc.com/pic/1.jpeg"> Test two   </a>

or 

<a HREF = http://www.ghi.com>  <img src="http://www.abc.com/pic/1.jpeg"> Test three   </a>

or 

<A Target="_blank" HREF=http://www.jkl.com>    <IMG SRC="http://www.abc.com/kuit.jpg" width="200" height="132" alt="  bla bla bla bla bla   " align="left">  </A>
EOSTRING;

require("simple_html_dom.php");
$dom = str_get_html($string_html);

$links = $dom->find("a");
foreach($links as $link)
{
   $href = trim($link->href);
   $innertext = trim($link->plaintext);
   
   echo "HREF=" . $href . ", TEXT=" . $innertext;
   
    // See if the link contains an image or text
    $linkimages = $link->find("img");
    if(count($linkimages))
    {
    	// Has at least one image insid ethe link
    	foreach($linkimages as $idx => $linkimage)
    	{
    		echo ", IMG #".($idx+1)." SRC=".$linkimage->src.", ALT=".$linkimage->alt;
    	}
    }
    else
    {
    	echo ", No images";
    }
    echo "\n";
}

Open in new window

0
 
LVL 15

Expert Comment

by:Insoftservice
ID: 40214070
$html = '<a href="http://www.lovetomarry.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

for img
<?php

$html = '<img src="http://path.to/img" id="randomid" />';

if (preg_match('/<img.+?src(?: )*=(?: )*[\'"](.*?)[\'"]/si', $html, $arrResult)) {
    echo $arrResult[1];  // Should display http://path.to/img
} else {
    echo "No match found";
}

?>
0
 

Author Closing Comment

by:peps03
ID: 40214265
Thanks! The simplehtmldom library works like a charm!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question