Solved

Regular expressions again

Posted on 2001-06-05
10
205 Views
Last Modified: 2010-08-05
Hi,

I want to grab html file, but I am looking to grab the following:

1. all ancor tags if it is includes text only , example:

<a href="...">Welcome</a> --- Yes

<a href="..."><img src=".."></a> --- No

2. all image tags: <img ..... >

Any idea ??

Regards,,
0
Comment
Question by:Zuhair070699
  • 6
  • 4
10 Comments
 
LVL 8

Expert Comment

by:us111
ID: 6156204
Pretty nice solution from LexZEUS:

1.
<?
$html = '<HTML>
             foo text0
             <a href = "http://www.1.com" ><img src="lklki"></a>foo text1
             <a href =
                       "http://www.2.com"
             >texta</a>
             foo text1
             <a   href  = http://www.3.com >texta</a>foo text1
             <a   href     =  \'http://www.4.com\'
             >texta</a>foo text1

             <a href="http://linka">texta</a>foo text1
             <a href="http://linkb">textb</a>foo text2
             foo text2
             </HTML>
             ';

     preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
     preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

     for ($i=0;$i<count($arr);$i++)
         print $arr[$i][1]."  ".$arr_text[$i][1]."\n";
?>

2. still to come
0
 
LVL 8

Expert Comment

by:us111
ID: 6156310
would ou like to  get all the <img "pictures/fsdfsf.gif"> or just the image link pictures/fsdfsf.gif ??
0
 

Author Comment

by:Zuhair070699
ID: 6158531
Hi,

The first one is 100% OK

and about image tags could you please explain the two methods.

Thanx
0
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

 

Author Comment

by:Zuhair070699
ID: 6158650
Hi,
Regarding to the first script I tried the following example and there is a problem:

<?
$html = '<p><a href="http://www.link1.com">Link1</a></p>
<p><a href="http://www.link2.com">Link2</a></p>
<p><a class=z href="http://www.link3.com" >Link3</a></p>
<p><a href="http://www.link4.com"><img src="" alt=""></a></p>
<p><a href="http://www.link5.com">Link5</a></p>
<p><a href="http://www.link6.com">Link6</a></p>
<p><a href="http://www.link7.com">Link7</a></p>
<p><a href="http://www.link8.com">Link8</a></p>
<p><a href="http://www.link9.com">Link9</a></p>
<p><a href="http://www.link10.com">Link10</a></p>   ';

    preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
    preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

    for ($i=0;$i<count($arr);$i++)
         
        print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
                   
                   
?>

===================

The output seems Ok put the links is not ok starting from link3 which should point to http://www.link3.com rather than http://www.link4.com 

Any idea?

Regards,,
0
 
LVL 8

Expert Comment

by:us111
ID: 6159590
oh sh... ;)) it's because you have  <a class=z....
I need to modify the regular expression
0
 

Author Comment

by:Zuhair070699
ID: 6172735
waiting .....
0
 
LVL 8

Expert Comment

by:us111
ID: 6173755
in progress.... :)
0
 

Author Comment

by:Zuhair070699
ID: 6200941
waiting  :(
0
 
LVL 8

Accepted Solution

by:
us111 earned 50 total points
ID: 6201269
// For url
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

for ($i=0;$i<count($arr);$i++)
     print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
0
 
LVL 8

Expert Comment

by:us111
ID: 6201312
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>(<[^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question