prevarant
asked on
PHP Parse, Extract and print content from another website
Hi Experts,
I need to extract content from this website: http://www.astrolook.com/dnevni.shtml
I need to extract text and print it for every <!-pocetak--><!-kraj--> Tag. I also need text between
<font class="htext"></font>
Example:
1) Ovan
Some description between pocetak/kraj for "Ovan"
2) Bik
Some description betweeb pocetak/kraj for "Bik"
3)....
Thank You in advance
Marko Miljus
I need to extract content from this website: http://www.astrolook.com/dnevni.shtml
I need to extract text and print it for every <!-pocetak--><!-kraj--> Tag. I also need text between
<font class="htext"></font>
Example:
1) Ovan
Some description between pocetak/kraj for "Ovan"
2) Bik
Some description betweeb pocetak/kraj for "Bik"
3)....
Thank You in advance
Marko Miljus
1) Ovan: Some sta citas dalje ;-)
This how it would work in JavaScript (I have no PHP for test):
<script>
window.onload = function(){
var theText = document.getElementsByTagN ame("table ")[4].inne rHTML;
theText = theText.replace(/<\/font>< BR>/gi,": ");
theText = theText.replace(/<[^>]+>/g ,"");
alert(theText)
}
</script>
<script>
window.onload = function(){
var theText = document.getElementsByTagN
theText = theText.replace(/<\/font><
theText = theText.replace(/<[^>]+>/g
alert(theText)
}
</script>
ASKER
Pozdrav Zvonko!
Ths won't work because I need to parse and retrive content from another server page and load it to my websites's page.
I tried this PHP code but than I get almost all content:
-------------------------- ---------- ---------- ---------- ------
<?php
$page = "http://www.astrolook.com/dnevni.shtml";
// tags
$start = '<!-pocetak-->';
$end = '<!-kraj-->';
// open the file
$fp = fopen( $page, 'r' );
$cont = "";
// read the contents
while( !feof( $fp ) ) {
$buf = trim( fgets( $fp, 4096 ) );
$cont .= $buf;
}
// get tag contents
preg_match( "/$start(.*)$end/s", $cont, $match );
// tag contents
$contents = $match[ 1 ];
echo $match[ 1 ];
?>
-------------------------- ---------- ---------- ---------- -----
I think that I need to put "break;" somewhere.
Ths won't work because I need to parse and retrive content from another server page and load it to my websites's page.
I tried this PHP code but than I get almost all content:
--------------------------
<?php
$page = "http://www.astrolook.com/dnevni.shtml";
// tags
$start = '<!-pocetak-->';
$end = '<!-kraj-->';
// open the file
$fp = fopen( $page, 'r' );
$cont = "";
// read the contents
while( !feof( $fp ) ) {
$buf = trim( fgets( $fp, 4096 ) );
$cont .= $buf;
}
// get tag contents
preg_match( "/$start(.*)$end/s", $cont, $match );
// tag contents
$contents = $match[ 1 ];
echo $match[ 1 ];
?>
--------------------------
I think that I need to put "break;" somewhere.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
actually, your regex will match from the very first start tag on the page to the very last one on the page. you need to use a non-greedy match qualifier on your .*, so make it .*? so that it will break on the first end tag, rather than the last.
ASKER
Ebosscher, this *? works but I only get content for first pair of tags <!-pocetak--><!-kraj-->.
How to get for all 12 pairs?
Thanks
How to get for all 12 pairs?
Thanks
Just to push ym answer again - The script will do exactly what you want, just change the URL... :)
ASKER
Thank You Basiclife, I didn't try Your code until...and now...it works. Simple solution does the job!
<?php
$url="http://www.astrolook.com/dnevni.shtml";
$contents=file_get_content s($url);
$open="<!-pocetak-->";
$close="<!-kraj-->";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
$start = strpos($contents, $open, $end);
if($start === false) {$finished=true;}
$end = strpos($contents, $close, $start);
if($end === false) {$finished=true;}
if($start !== false && $end !== false) {
print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR/><BR/>";
}
}
<?php
$url="http://www.astrolook.com/dnevni.shtml";
$contents=file_get_content
$open="<!-pocetak-->";
$close="<!-kraj-->";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
$start = strpos($contents, $open, $end);
if($start === false) {$finished=true;}
$end = strpos($contents, $close, $start);
if($end === false) {$finished=true;}
if($start !== false && $end !== false) {
print substr($contents, $start+strlen($open), $end-$start-strlen($open))
}
}
Excellent, glad I could help :)