Stripping out href links in an html page
Posted on 2004-09-21
A question if you please...
I have an html page retrieved by loading a static html page $result_html. In this page there are 3 main sections, a header, footer and body. Within the body section their are many hyper links but one type of link is always preceeded by a specific word and ended with a specific comment ie
eg1.1) <!--system_link_start--><a href="system.php"<!--system_link_end-->
The header is ended by a comment also <!--header_text_end--> and the footer is started by <!--footer_text_start-->
What I need to do is take the $result_html page and basically strip out only the body content, ie remove everything up to <!--header_text_end--> and remove everything after <!--footer_text_start-->. This should leave me with a variable containing just the core section of the original html page. I then want to remove (in order) all of the 'special' hyperlinks such as (eg1.1 listed above) remembering that all of these links are surrounded by the <!--system_link_start--> and <!--system_link_end--> comments.
Once I have exported all of these links I should have a variable containing something like
<a href="domain.ext/page1.php">link text 1</a>
<a href="domain.ext/page2.php">link text 2</a>
<a href="domain.ext/page3.php">link text 3</a>
<a href="domain.ext/page4.php">link text 4</a>
<a href="domain.ext/page5.php">link text 5</a>
<a href="domain.ext/page6.php">link text 6</a>
I then need to query this list to see what item number a perticular link is on, ie if I was profiling domain.ext/page4.php the result would be 4.
Cheers in advance, sorry but max 500 points as pre-set level by EE