Link to home
Start Free TrialLog in
Avatar of prevarant
prevarant

asked on

PHP Parse, Extract and print content from another website

Hi Experts,

I need to extract content from this website: http://www.astrolook.com/dnevni.shtml
I need to extract text and print it for every <!-pocetak--><!-kraj--> Tag. I also need text between
<font class="htext"></font>

Example:
1) Ovan
Some description between pocetak/kraj for "Ovan"

2) Bik
Some description betweeb pocetak/kraj for "Bik"

3)....

Thank You in advance
Marko Miljus
Avatar of Zvonko
Zvonko
Flag of North Macedonia image

1) Ovan: Some sta citas dalje ;-)
This how it would work in JavaScript (I have no PHP for test):

<script>
window.onload = function(){
  var theText = document.getElementsByTagName("table")[4].innerHTML;
  theText = theText.replace(/<\/font><BR>/gi,":  ");
  theText = theText.replace(/<[^>]+>/g,"");
  alert(theText)
}
</script>



Avatar of prevarant
prevarant

ASKER

Pozdrav Zvonko!
Ths won't work because I need to parse and retrive content from another server page and load it to my websites's page.

I tried this PHP code but than I get almost all content:
--------------------------------------------------------------
<?php

$page = "http://www.astrolook.com/dnevni.shtml";

    // tags

    $start = '<!-pocetak-->';
    $end = '<!-kraj-->';

    // open the file
    $fp = fopen( $page, 'r' );

    $cont = "";

    // read the contents
    while( !feof( $fp ) ) {
        $buf = trim( fgets( $fp, 4096 ) );
        $cont .= $buf;
    }
   
    // get tag contents
    preg_match( "/$start(.*)$end/s", $cont, $match );

    // tag contents
    $contents = $match[ 1 ];
      echo $match[ 1 ];

?>
-------------------------------------------------------------
I think that I need to put "break;" somewhere.
ASKER CERTIFIED SOLUTION
Avatar of basiclife
basiclife

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
actually, your regex will match from the very first start tag on the page to the very last one on the page.  you need to use a non-greedy match qualifier on your .*, so make it .*? so that it will break on the first end tag, rather than the last.
Ebosscher, this *? works but I only get content for first pair of tags <!-pocetak--><!-kraj-->.
How to get for all 12 pairs?
Thanks
Just to push ym answer again - The script will do exactly what you want, just change the URL... :)
Thank You Basiclife, I didn't try Your code until...and now...it works. Simple solution does the job!

<?php
$url="http://www.astrolook.com/dnevni.shtml";
$contents=file_get_contents($url);
$open="<!-pocetak-->";
$close="<!-kraj-->";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}
      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}
            if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR/><BR/>";
      }
}
Excellent, glad I could help :)