Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 307
  • Last Modified:

Reading content from other websites?

I tried searching the web for tutorials and examples of extracting data from other sites and having them displayed on your website, but I was unable to find any thing.

Does anyone have any idea how this can be achieved?
0
sulTaN
Asked:
sulTaN
1 Solution
 
KC_SpeedballCommented:
what data do you want?
isn't it enough to link other pages?
anything else isn't easy
0
 
sulTaNAuthor Commented:
Well, there isn't any specific piece of data that I want at the moment. I want my page to dynamically change as changes are made to the page from which I'm getting the data.
0
 
VGRCommented:
very easy, I just answered this very recently

look at this and adapt to your needs :

<html>
<head>
<title>CSComps.com</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<META NAME="keywords" CONTENT="3g, cscomps, cs, mobile, phones, cell phones, wap, content, free, ringers, screensavers, pictures, wallpaper, wallpapers, pics, pic, picture, sanyo, samsung, a500, 5300, 4900, scp, games, j2me, java, browser, links, news, movies, vision, sprint, sprintpcs, pcs, vision, nokia">
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body text="black">
<?
// initialisations
// outer ones
$filename='http://www.pricewatch.com/menus/m37.htm';
// inner ones
$products=array(); // explicit
$k=0; // number of products found
// URI access
$fd = @fopen ($filename, "r");
if ($fd) { // si page trouvie
 while (!feof ($fd)) {
   $ligne= fgets($fd, 4096);
   // here you can make some parsing on-the-fly, for example to stop on "RADEON 9700 PRO"
   $contents []=$ligne;
 } // while lecture bloquante
 //  $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly
 fclose ($fd);
 // here call parsing for products.
 Analyse($contents,$products,$k);
} else { // page not found
 // infos emptied
 $k=0;
 // here logging or email alert for invalid page access
}
// you've your $k products, proceed with displaying or searching for $products[1..$k]["name"]=="RADEON 9700 PRO"

echo "update begins.<BR>";
TODO-delete_line_please
// CHECK THOSE VALUES !!
$dbLogin = 'bshaq34';
$dbPassword = 'fr3shd0r';
$dbName = 'cscomps_com';
$dbHost = 'localhost';
//EoCheckToDo

$linkID=mysql_connect("$dbHost","$dbLogin","$dbPassword") or die ("bad connect".mysql_error());
mysql_select_db($dbName,$linkID) or die ("bad select DB ".mysql_error());
$query="select * from interest;";
$result=mysql_query($query,$linkID) or die ("bad query $query. ".mysql_error());
while ($res=mysql_fetch_array($result)) { // search for that product in the memory array
  $locName=$res["prodname"]; // char(40)
  $locPrice=$res["price"]; // float(5,1)
  $locID=$res["id"]; // index in table
  // now search in memory for update
  $i=1;
  $notfound=TRUE;
  while (($i<=$k)AND($notfound)) { // search in memory
    $notfound=($products[$i]["name"]<>$locName);
    $i++;
  } // while not found in memory
  // warning on successful finding, we get out with $i one higher than required ;-)
  if ($notfound) { // no luck
    echo "strange : product '$locName' in database was not found any more on the
    HTML site...<BR>";
  } else { // normal case
    $i--; // correction
    if ($products[$i]["price"]<>$locPrice) { // price needs updating (beware, this is a float numerical comparison, beware of precision)
      echo "product '$locName' in database needs price update from $locPrice to
      ".$products[$i]["price"]."<BR>";
      $query2="UPDATE interest SET price=".$products[$i]["price"]." WHERE id=$locID;";
      $result2=mysql_query($query2,$linkID) or die ("bad UPDATE query $query2.
      ".mysql_error());
      echo "product database updated successfully for product '$locName'
      (ID=$locID).<BR>";
    } else {
      echo "product '$locName' in database doesn't need price update ( still
      $locPrice )<BR>";
    } // if price needs update
  } // if found in memory or not
} // end of search in memory
echo "update finished.<BR>";

// define somewhere this function :
function Analyse($contents,&$products,&$k) {
 // here GLOBALS if need be
 $i=0; // n0 de la ligne courante dans $contents[]
 $j=count($contents);
 $yy=0; // position in line (remember HTML lines may be loooong)
 while ($i<$j) { // while not finished unfructuously
   while ((strpos($contents[$i],'<tr><td>')===false) and ($i<$j)) $i++;
   if ($i<>$j) { // found a data block, else finished
     // filling in
     $deb='<tr><td>'; // constant introduced for extensiveness
     $fin='</td>'; // idem
     $ligne=substr($contents[$i],$yy); // rest of line after previous processing
     while (($i<=$j) and (($m=strpos($ligne,$deb))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // skip lines until block found or end of data
     if ($i<=$j) { // found a product, and its position is in $m
       $k++; // increments #products found
       $m=$m+strlen($deb);
       $n=$m;
       $locRes='';
       $l=strlen($fin);
       while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
       if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
       if (!($n===false)) {
         $locRes.=substr($ligne,$m,$n-$m);
         $yy=$yy+$n+1;
       } else { $locRes.=''; $yy=0; }
       $products[$k]["price"]=substr($locRes,1); // get rid of dollar symbol
       $deb='">'; $fin='</A'; // note case
       $locRes="";
       $ligne=substr($contents[$i],$yy); // rest of line after previous processing
       while (($i<=$j) and (($m=strpos($ligne,'ID='))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // special
       if (! ($m===false)) { // found a product, and its position is in $m
         $l=strlen($deb);
         while (substr($ligne,$m,$l)<>$deb) $m++;
         $m=$m+strlen($deb);
         $n=$m;
         $l=strlen($fin);
         $locRes='';
         while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
         if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
         if (!($n===false)) {
           $locRes.=substr($ligne,$m,$n-$m);
           $yy=$yy+$n+1;
         } else { $locRes.=''; $yy=0; }
       } // if found second tagged data
       $products[$k]["name"]=$locRes;
     } // if found product
   } // if found a new data block (product line) or end-of-data marker
   // else finished
 } // while not finished entirely
} // Analyse Procedure
// Nota Bene : all the ugly parsing I do is usually handled via a function more "intelligent" called GetChunk ;-)
?>
</body>
</html>
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
JonzaCommented:
get some other site to a variable like this:

$source = file("http://www.somedomain.com");
or with no linebreaks:
$source = implode('', file("http://www.somedomain.com"));
0
 
VGRCommented:
yes, true too, BUT (again)

$fd = @fopen ($filename, "r");
if ($fd) { // page found
while (!feof ($fd)) {
  $ligne= fgets($fd, 4096);
  // here you can make some parsing on-the-fly, for example to stop on "RADEON 9700 PRO"
  $contents []=$ligne;
} // while blocking read
//  $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly
fclose ($fd);


enables me to parse line per line, thus "saving" some processing&reading if I found what I'm searching for at some point.
Provides also the ability to check for an error in accessing the URI
I guess file() doesn't construct an array in memory, but I may be wrong
0
 
ianmateCommented:

You also may want to contact a lawyer and have them explain the phrase "Passing off" to you...
0
 
VGRCommented:
excuse-me ?
what does this mean ?
I don't understand "passing off", could you explain with simple English words ?
0
 
ianmateCommented:

If you're simply ripping off other people's content without their prior agreement and displaying it on your site then you can be done for simple Copyright infringement.

"Passing off" is where you offer a service or goods or goods of some kind that are so similar to another company's that it would be reasonable to infer some kind of association.

So I take it back: you're more likely to be committing copyright infringement.
0
 
VGRCommented:
ho, I see.
First this has already been discussed.
Secondly I already use this in some sites of mine.
Third the PHP script is exactly in the position of any client's browser and does only get HTML sent by the distant server. It doesn't "aspirate" the distant website, doesn't download (potentially protected) documents and images, etc

I don't see any "copyright infringement" possible ;-)
0
 
ianmateCommented:

It depends how you present the data - the difference between passing off and copyright infringment is a little grey, which is why I recommended you see a laywer to explain it.

If you're presenting someone else's data as if it was your own then there is a clear copyright infringement - the data doesn't have to be "secret" in any way.
0
 
VGRCommented:
of course, but WHO suspected that the data was presented as not belonging to their possessors ? :D

I understand your concern , but it has nothing to do (or at least is not always applicable) to this technique.

One more word : it's the MAN who uses badly (wrongly) a technique, device or weapon who's guilty, NOT the technique, device or weapon itself...
0
 
VGRCommented:
oh, by the way, did I answer your question ?
I'm a bit surprised that YOU raise the copyright issue, given that's YOU who asked the question on how to "pump" an other site's data ;-)

Sorry to be perhaps to frank :D
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now