Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Regular expressions again

Posted on 2001-06-05
10
Medium Priority
?
217 Views
Last Modified: 2010-08-05
Hi,

I want to grab html file, but I am looking to grab the following:

1. all ancor tags if it is includes text only , example:

<a href="...">Welcome</a> --- Yes

<a href="..."><img src=".."></a> --- No

2. all image tags: <img ..... >

Any idea ??

Regards,,
0
Comment
Question by:Zuhair070699
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 8

Expert Comment

by:us111
ID: 6156204
Pretty nice solution from LexZEUS:

1.
<?
$html = '<HTML>
             foo text0
             <a href = "http://www.1.com" ><img src="lklki"></a>foo text1
             <a href =
                       "http://www.2.com"
             >texta</a>
             foo text1
             <a   href  = http://www.3.com >texta</a>foo text1
             <a   href     =  \'http://www.4.com\'
             >texta</a>foo text1

             <a href="http://linka">texta</a>foo text1
             <a href="http://linkb">textb</a>foo text2
             foo text2
             </HTML>
             ';

     preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
     preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

     for ($i=0;$i<count($arr);$i++)
         print $arr[$i][1]."  ".$arr_text[$i][1]."\n";
?>

2. still to come
0
 
LVL 8

Expert Comment

by:us111
ID: 6156310
would ou like to  get all the <img "pictures/fsdfsf.gif"> or just the image link pictures/fsdfsf.gif ??
0
 

Author Comment

by:Zuhair070699
ID: 6158531
Hi,

The first one is 100% OK

and about image tags could you please explain the two methods.

Thanx
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:Zuhair070699
ID: 6158650
Hi,
Regarding to the first script I tried the following example and there is a problem:

<?
$html = '<p><a href="http://www.link1.com">Link1</a></p>
<p><a href="http://www.link2.com">Link2</a></p>
<p><a class=z href="http://www.link3.com" >Link3</a></p>
<p><a href="http://www.link4.com"><img src="" alt=""></a></p>
<p><a href="http://www.link5.com">Link5</a></p>
<p><a href="http://www.link6.com">Link6</a></p>
<p><a href="http://www.link7.com">Link7</a></p>
<p><a href="http://www.link8.com">Link8</a></p>
<p><a href="http://www.link9.com">Link9</a></p>
<p><a href="http://www.link10.com">Link10</a></p>   ';

    preg_match_all("/<a[[:space:]]+href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
    preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

    for ($i=0;$i<count($arr);$i++)
         
        print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
                   
                   
?>

===================

The output seems Ok put the links is not ok starting from link3 which should point to http://www.link3.com rather than http://www.link4.com 

Any idea?

Regards,,
0
 
LVL 8

Expert Comment

by:us111
ID: 6159590
oh sh... ;)) it's because you have  <a class=z....
I need to modify the regular expression
0
 

Author Comment

by:Zuhair070699
ID: 6172735
waiting .....
0
 
LVL 8

Expert Comment

by:us111
ID: 6173755
in progress.... :)
0
 

Author Comment

by:Zuhair070699
ID: 6200941
waiting  :(
0
 
LVL 8

Accepted Solution

by:
us111 earned 200 total points
ID: 6201269
// For url
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>([^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);

for ($i=0;$i<count($arr);$i++)
     print "<a href=\"".$arr[$i][1]."\">".$arr_text[$i][1]."</a> + \n";
0
 
LVL 8

Expert Comment

by:us111
ID: 6201312
preg_match_all("/href[[:space:]]*=[[:space:]]*"."[\"']{0,1}([^\"'> ]+)/i",$html,$arr,PREG_SET_ORDER);
preg_match_all("|>(<[^<]*)</a|i",$html,$arr_text,PREG_SET_ORDER);
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question