Solved

Regular expressions again

Posted on 2001-06-05
10
206 Views
Last Modified: 2010-08-05
Hi,

I want to grab html file, but I am looking to grab the following:

1. all ancor tags if it is includes text only , example:

<a href="...">Welcome</a> --- Yes

<a href="..."><img src=".."></a> --- No

2. all image tags: <img ..... >

Any idea ??

Regards,,
0
Comment
Question by:Zuhair070699
  • 6
  • 4
10 Comments
 
LVL 8

Expert Comment

by:us111
ID: 6156204
Pretty nice solution from LexZEUS:

1.
<?
$html = '<HTML>
             foo text0
             <a href = "http://www.1.com" ><img src="lklki"></a>foo text1
             <a href =
                       "http://www.2.com"
             >texta</a>
             foo text1
             <a   href  = http://www.3.com >texta</a>foo text1
             <a   href     =  \'http://www.4.com\'
             >texta</a>foo text1

             <a href="http://linka">texta</a>foo text1
             <a href="http://linkb">textb</a>foo text2
             foo text2
             </HTML>
             ';

     preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
     preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

     for ($i=0;$i<count($arr);$i++)
         print $arr[$i][1]."  ".$arr_text[$i][1]."\n";
?>

2. still to come
0
 
LVL 8

Expert Comment

by:us111
ID: 6156310
would ou like to  get all the <img "pictures/fsdfsf.gif"> or just the image link pictures/fsdfsf.gif ??
0
 

Author Comment

by:Zuhair070699
ID: 6158531
Hi,

The first one is 100% OK

and about image tags could you please explain the two methods.

Thanx
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 

Author Comment

by:Zuhair070699
ID: 6158650
Hi,
Regarding to the first script I tried the following example and there is a problem:

<?
$html = '<p><a href="http://www.link1.com">Link1</a></p>
<p><a href="http://www.link2.com">Link2</a></p>
<p><a class=z href="http://www.link3.com" >Link3</a></p>
<p><a href="http://www.link4.com"><img src="" alt=""></a></p>
<p><a href="http://www.link5.com">Link5</a></p>
<p><a href="http://www.link6.com">Link6</a></p>
<p><a href="http://www.link7.com">Link7</a></p>
<p><a href="http://www.link8.com">Link8</a></p>
<p><a href="http://www.link9.com">Link9</a></p>
<p><a href="http://www.link10.com">Link10</a></p>   ';

    preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
    preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

    for ($i=0;$i<count($arr);$i++)
         
        print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
                   
                   
?>

===================

The output seems Ok put the links is not ok starting from link3 which should point to http://www.link3.com rather than http://www.link4.com 

Any idea?

Regards,,
0
 
LVL 8

Expert Comment

by:us111
ID: 6159590
oh sh... ;)) it's because you have  <a class=z....
I need to modify the regular expression
0
 

Author Comment

by:Zuhair070699
ID: 6172735
waiting .....
0
 
LVL 8

Expert Comment

by:us111
ID: 6173755
in progress.... :)
0
 

Author Comment

by:Zuhair070699
ID: 6200941
waiting  :(
0
 
LVL 8

Accepted Solution

by:
us111 earned 50 total points
ID: 6201269
// For url
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

for ($i=0;$i<count($arr);$i++)
     print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
0
 
LVL 8

Expert Comment

by:us111
ID: 6201312
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>(<[^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo‚Ķ
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to count occurrences of each item in an array.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question