Reading content from other websites?

Posted on 2003-02-25
Medium Priority
Last Modified: 2006-11-17
I tried searching the web for tutorials and examples of extracting data from other sites and having them displayed on your website, but I was unable to find any thing.

Does anyone have any idea how this can be achieved?
Question by:sulTaN
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 8022009
what data do you want?
isn't it enough to link other pages?
anything else isn't easy

Author Comment

ID: 8022029
Well, there isn't any specific piece of data that I want at the moment. I want my page to dynamically change as changes are made to the page from which I'm getting the data.
LVL 15

Accepted Solution

VGR earned 400 total points
ID: 8023251
very easy, I just answered this very recently

look at this and adapt to your needs :

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<META NAME="keywords" CONTENT="3g, cscomps, cs, mobile, phones, cell phones, wap, content, free, ringers, screensavers, pictures, wallpaper, wallpapers, pics, pic, picture, sanyo, samsung, a500, 5300, 4900, scp, games, j2me, java, browser, links, news, movies, vision, sprint, sprintpcs, pcs, vision, nokia">
<link rel="stylesheet" type="text/css" href="style.css" />
<body text="black">
// initialisations
// outer ones
// inner ones
$products=array(); // explicit
$k=0; // number of products found
// URI access
$fd = @fopen ($filename, "r");
if ($fd) { // si page trouvie
 while (!feof ($fd)) {
   $ligne= fgets($fd, 4096);
   // here you can make some parsing on-the-fly, for example to stop on "RADEON 9700 PRO"
   $contents []=$ligne;
 } // while lecture bloquante
 //  $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly
 fclose ($fd);
 // here call parsing for products.
} else { // page not found
 // infos emptied
 // here logging or email alert for invalid page access
// you've your $k products, proceed with displaying or searching for $products[1..$k]["name"]=="RADEON 9700 PRO"

echo "update begins.<BR>";
$dbLogin = 'bshaq34';
$dbPassword = 'fr3shd0r';
$dbName = 'cscomps_com';
$dbHost = 'localhost';

$linkID=mysql_connect("$dbHost","$dbLogin","$dbPassword") or die ("bad connect".mysql_error());
mysql_select_db($dbName,$linkID) or die ("bad select DB ".mysql_error());
$query="select * from interest;";
$result=mysql_query($query,$linkID) or die ("bad query $query. ".mysql_error());
while ($res=mysql_fetch_array($result)) { // search for that product in the memory array
  $locName=$res["prodname"]; // char(40)
  $locPrice=$res["price"]; // float(5,1)
  $locID=$res["id"]; // index in table
  // now search in memory for update
  while (($i<=$k)AND($notfound)) { // search in memory
  } // while not found in memory
  // warning on successful finding, we get out with $i one higher than required ;-)
  if ($notfound) { // no luck
    echo "strange : product '$locName' in database was not found any more on the
    HTML site...<BR>";
  } else { // normal case
    $i--; // correction
    if ($products[$i]["price"]<>$locPrice) { // price needs updating (beware, this is a float numerical comparison, beware of precision)
      echo "product '$locName' in database needs price update from $locPrice to
      $query2="UPDATE interest SET price=".$products[$i]["price"]." WHERE id=$locID;";
      $result2=mysql_query($query2,$linkID) or die ("bad UPDATE query $query2.
      echo "product database updated successfully for product '$locName'
    } else {
      echo "product '$locName' in database doesn't need price update ( still
      $locPrice )<BR>";
    } // if price needs update
  } // if found in memory or not
} // end of search in memory
echo "update finished.<BR>";

// define somewhere this function :
function Analyse($contents,&$products,&$k) {
 // here GLOBALS if need be
 $i=0; // n0 de la ligne courante dans $contents[]
 $yy=0; // position in line (remember HTML lines may be loooong)
 while ($i<$j) { // while not finished unfructuously
   while ((strpos($contents[$i],'<tr><td>')===false) and ($i<$j)) $i++;
   if ($i<>$j) { // found a data block, else finished
     // filling in
     $deb='<tr><td>'; // constant introduced for extensiveness
     $fin='</td>'; // idem
     $ligne=substr($contents[$i],$yy); // rest of line after previous processing
     while (($i<=$j) and (($m=strpos($ligne,$deb))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // skip lines until block found or end of data
     if ($i<=$j) { // found a product, and its position is in $m
       $k++; // increments #products found
       while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
       if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
       if (!($n===false)) {
       } else { $locRes.=''; $yy=0; }
       $products[$k]["price"]=substr($locRes,1); // get rid of dollar symbol
       $deb='">'; $fin='</A'; // note case
       $ligne=substr($contents[$i],$yy); // rest of line after previous processing
       while (($i<=$j) and (($m=strpos($ligne,'ID='))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // special
       if (! ($m===false)) { // found a product, and its position is in $m
         while (substr($ligne,$m,$l)<>$deb) $m++;
         while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
         if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
         if (!($n===false)) {
         } else { $locRes.=''; $yy=0; }
       } // if found second tagged data
     } // if found product
   } // if found a new data block (product line) or end-of-data marker
   // else finished
 } // while not finished entirely
} // Analyse Procedure
// Nota Bene : all the ugly parsing I do is usually handled via a function more "intelligent" called GetChunk ;-)
7 Extremely Useful Linux Commands for Beginners

Just getting started with Linux? Here's a quick start guide that has 7 commands that we believe will come in handy.


Expert Comment

ID: 8023296
get some other site to a variable like this:

$source = file("http://www.somedomain.com");
or with no linebreaks:
$source = implode('', file("http://www.somedomain.com"));
LVL 15

Expert Comment

ID: 8023433
yes, true too, BUT (again)

$fd = @fopen ($filename, "r");
if ($fd) { // page found
while (!feof ($fd)) {
  $ligne= fgets($fd, 4096);
  // here you can make some parsing on-the-fly, for example to stop on "RADEON 9700 PRO"
  $contents []=$ligne;
} // while blocking read
//  $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly
fclose ($fd);

enables me to parse line per line, thus "saving" some processing&reading if I found what I'm searching for at some point.
Provides also the ability to check for an error in accessing the URI
I guess file() doesn't construct an array in memory, but I may be wrong

Expert Comment

ID: 8024981

You also may want to contact a lawyer and have them explain the phrase "Passing off" to you...
LVL 15

Expert Comment

ID: 8025206
excuse-me ?
what does this mean ?
I don't understand "passing off", could you explain with simple English words ?

Expert Comment

ID: 8025300

If you're simply ripping off other people's content without their prior agreement and displaying it on your site then you can be done for simple Copyright infringement.

"Passing off" is where you offer a service or goods or goods of some kind that are so similar to another company's that it would be reasonable to infer some kind of association.

So I take it back: you're more likely to be committing copyright infringement.
LVL 15

Expert Comment

ID: 8025333
ho, I see.
First this has already been discussed.
Secondly I already use this in some sites of mine.
Third the PHP script is exactly in the position of any client's browser and does only get HTML sent by the distant server. It doesn't "aspirate" the distant website, doesn't download (potentially protected) documents and images, etc

I don't see any "copyright infringement" possible ;-)

Expert Comment

ID: 8025410

It depends how you present the data - the difference between passing off and copyright infringment is a little grey, which is why I recommended you see a laywer to explain it.

If you're presenting someone else's data as if it was your own then there is a clear copyright infringement - the data doesn't have to be "secret" in any way.
LVL 15

Expert Comment

ID: 8025496
of course, but WHO suspected that the data was presented as not belonging to their possessors ? :D

I understand your concern , but it has nothing to do (or at least is not always applicable) to this technique.

One more word : it's the MAN who uses badly (wrongly) a technique, device or weapon who's guilty, NOT the technique, device or weapon itself...
LVL 15

Expert Comment

ID: 8025506
oh, by the way, did I answer your question ?
I'm a bit surprised that YOU raise the copyright issue, given that's YOU who asked the question on how to "pump" an other site's data ;-)

Sorry to be perhaps to frank :D

Featured Post

WordPress Tutorial 4: Recommended Plugins

Now that you have WordPress installed, understand the interface, and know how to install new parts, let’s take a look at our recommended plugins.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question