Solved

Regular expressions again

Posted on 2001-06-05
10
208 Views
Last Modified: 2010-08-05
Hi,

I want to grab html file, but I am looking to grab the following:

1. all ancor tags if it is includes text only , example:

<a href="...">Welcome</a> --- Yes

<a href="..."><img src=".."></a> --- No

2. all image tags: <img ..... >

Any idea ??

Regards,,
0
Comment
Question by:Zuhair070699
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 8

Expert Comment

by:us111
ID: 6156204
Pretty nice solution from LexZEUS:

1.
<?
$html = '<HTML>
             foo text0
             <a href = "http://www.1.com" ><img src="lklki"></a>foo text1
             <a href =
                       "http://www.2.com"
             >texta</a>
             foo text1
             <a   href  = http://www.3.com >texta</a>foo text1
             <a   href     =  \'http://www.4.com\'
             >texta</a>foo text1

             <a href="http://linka">texta</a>foo text1
             <a href="http://linkb">textb</a>foo text2
             foo text2
             </HTML>
             ';

     preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
     preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

     for ($i=0;$i<count($arr);$i++)
         print $arr[$i][1]."  ".$arr_text[$i][1]."\n";
?>

2. still to come
0
 
LVL 8

Expert Comment

by:us111
ID: 6156310
would ou like to  get all the <img "pictures/fsdfsf.gif"> or just the image link pictures/fsdfsf.gif ??
0
 

Author Comment

by:Zuhair070699
ID: 6158531
Hi,

The first one is 100% OK

and about image tags could you please explain the two methods.

Thanx
0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 

Author Comment

by:Zuhair070699
ID: 6158650
Hi,
Regarding to the first script I tried the following example and there is a problem:

<?
$html = '<p><a href="http://www.link1.com">Link1</a></p>
<p><a href="http://www.link2.com">Link2</a></p>
<p><a class=z href="http://www.link3.com" >Link3</a></p>
<p><a href="http://www.link4.com"><img src="" alt=""></a></p>
<p><a href="http://www.link5.com">Link5</a></p>
<p><a href="http://www.link6.com">Link6</a></p>
<p><a href="http://www.link7.com">Link7</a></p>
<p><a href="http://www.link8.com">Link8</a></p>
<p><a href="http://www.link9.com">Link9</a></p>
<p><a href="http://www.link10.com">Link10</a></p>   ';

    preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
    preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

    for ($i=0;$i<count($arr);$i++)
         
        print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
                   
                   
?>

===================

The output seems Ok put the links is not ok starting from link3 which should point to http://www.link3.com rather than http://www.link4.com 

Any idea?

Regards,,
0
 
LVL 8

Expert Comment

by:us111
ID: 6159590
oh sh... ;)) it's because you have  <a class=z....
I need to modify the regular expression
0
 

Author Comment

by:Zuhair070699
ID: 6172735
waiting .....
0
 
LVL 8

Expert Comment

by:us111
ID: 6173755
in progress.... :)
0
 

Author Comment

by:Zuhair070699
ID: 6200941
waiting  :(
0
 
LVL 8

Accepted Solution

by:
us111 earned 50 total points
ID: 6201269
// For url
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

for ($i=0;$i<count($arr);$i++)
     print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
0
 
LVL 8

Expert Comment

by:us111
ID: 6201312
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>(<[^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question