asked on

PHP Parse, Extract and print content from another website

Hi Experts,

I need to extract content from this website: http://www.astrolook.com/dnevni.shtml
I need to extract text and print it for every <!-pocetak--><!-kraj--> Tag. I also need text between
<font class="htext"></font>

Example:
1) Ovan
Some description between pocetak/kraj for "Ovan"

2) Bik
Some description betweeb pocetak/kraj for "Bik"

3)....

Thank You in advance
Marko Miljus

Zvonko

1) Ovan: Some sta citas dalje ;-)

Zvonko

This how it would work in JavaScript (I have no PHP for test):

<script>
window.onload = function(){
var theText = document.getElementsByTagName("table")[4].innerHTML;
theText = theText.replace(/<\/font><BR>/gi,": ");
theText = theText.replace(/<[^>]+>/g,"");
alert(theText)
}
</script>

prevarant

ASKER

Pozdrav Zvonko!
Ths won't work because I need to parse and retrive content from another server page and load it to my websites's page.

I tried this PHP code but than I get almost all content:
--------------------------------------------------------------
<?php

$page = "http://www.astrolook.com/dnevni.shtml";

// tags

$start = '<!-pocetak-->';
$end = '<!-kraj-->';

// open the file
$fp = fopen( $page, 'r' );

$cont = "";

// read the contents
while( !feof( $fp ) ) {
$buf = trim( fgets( $fp, 4096 ) );
$cont .= $buf;
}

// get tag contents
preg_match( "/$start(.*)$end/s", $cont, $match );

// tag contents
$contents = $match[ 1 ];
echo $match[ 1 ];

?>
-------------------------------------------------------------
I think that I need to put "break;" somewhere.

ASKER CERTIFIED SOLUTION

basiclife

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ebosscher

actually, your regex will match from the very first start tag on the page to the very last one on the page. you need to use a non-greedy match qualifier on your .*, so make it .*? so that it will break on the first end tag, rather than the last.

prevarant

ASKER

Ebosscher, this *? works but I only get content for first pair of tags <!-pocetak--><!-kraj-->.
How to get for all 12 pairs?
Thanks

basiclife

Just to push ym answer again - The script will do exactly what you want, just change the URL... :)

prevarant

ASKER

Thank You Basiclife, I didn't try Your code until...and now...it works. Simple solution does the job!

<?php
$url="http://www.astrolook.com/dnevni.shtml";
$contents=file_get_contents($url);
$open="<!-pocetak-->";
$close="<!-kraj-->";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
$start = strpos($contents, $open, $end);
if($start === false) {$finished=true;}
$end = strpos($contents, $close, $start);
if($end === false) {$finished=true;}
if($start !== false && $end !== false) {
print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR/><BR/>";
}
}

basiclife

Excellent, glad I could help :)