Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1433
  • Last Modified:

Getting product block with simple html dom parser

I try to to scrap some information from web, but I am stuck with getting product block. Source off web you can find [here](/www.topocentras.lt/Telefonai-Navigacijos/Ismanieji-telefonai-GSM/)

My code is looking like this

    <h2>Telefonai topocentras</h2>
     </br>
    <?php
     include_once('simple_html_dom.php');
     $url = "http://www.topocentras.lt/Telefonai-Navigacijos/Ismanieji-telefonai-/";
     // Start from the main page
      $nextLink = $url;
     // Loop on each next Link as long as it exsists
     while ($nextLink) {
     echo "<hr>nextLink: $nextLink<br>";
     //Create a DOM object
     $html = new simple_html_dom();
     // Load HTML from a url
     $html->load_file($nextLink);
     //Try to find phone block
     $phones = $html->find('li#product-picture img[src]');
      
      foreach($phones as $phone) {
      //Try to find phone cost
        $cost = $phone->find('strong[class=price]', 0)->plaintext;
         //Try to find phone link
        $link = $phone->href;
         //Try to find phone name
        $name=$phone->find('li[a=title]',0)->plaintext;
         //Try to find phonGSMe photo source
        $photo= $phone->find('img[src]',0);
        echo $name, " #----# ", $cost, " #----# ", $link, " #----# ", $photo, "<br>";
        }
      $nextLink = ( ($temp = $html->find('a.href span[=""]', 0)) ?    "https://www.topocentras.lt".$temp->href : NULL );
    // Clear DOM object
    $html->clear();
    unset($html);
     }
     ?>

So problem is to get block of phone. I need all phone name, cost, price, link. I tried a lot of varies, but nothing is working. Maybe someone can tell how correctly get phone block?
0
Nekasas
Asked:
Nekasas
  • 3
  • 2
1 Solution
 
Ray PaseurCommented:
Please see http://iconoun.com/demo/temp_nekasas.php

Simple_HTML_DOM is only one way to scrape the document.  I find it easier to use conventional PHP instructions, rather than the object.

I am going to recommend that you do not do this at all.  If the publisher wants to share this information with you in a programmatic manner, they will expose an API that will give you structured data.  If you depend on scraping a web page to get the data you will find that your application is very brittle - it will break as soon as the publisher changes the format or tags inside the document.  And it follows that if the publisher discovers that you're using automation to capture their data, and the publisher does not want you to have this information from web scraping, it will be very easy for them to break your script without notice.

<?php // demo/temp_nekasas.php
error_reporting(E_ALL);

// SEE http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28422236.html

// THE HTML DOCUMENT
$htm = file_get_contents('http://www.topocentras.lt/Telefonai-Navigacijos/Ismanieji-telefonai-GSM/');

// FIRST SIGNAL STRING
$ssa = <<<EOD
<div class="pages">
EOD;

// LAST SIGNAL STRING
$ssz = <<<EOD
<div class="pages bottom-locator">
EOD;

// PRODUCT SEPARATORS
$ssp = <<<EOD
<div class="product-links clear">
EOD;

// MINIMIZE WHITESPACE
$htm = preg_replace('/\s\s+/', ' ', $htm);

// USE SIGNAL STRINGS TO DISCARD UNWANTED PARTS OF THE HTML DOCUMENT
$arr = explode($ssa, $htm);
$arr = explode($ssz, $arr[1]);

// LOCATE THE PRODUCTS DETAIL
$arr = explode($ssp, $arr[0]);
unset($arr[0]);

echo '<pre>';

foreach ($arr as $str)
{
    // GET TITLE
    $xyz = explode('title="', $str);
    $xyz = explode('">', $xyz[1]);
    $title = $xyz[0];

    // GET PRICE
    $xyz = explode('<strong class="price">', $str);
    $xyz = explode('<sup', $xyz[1]);
    $price = $xyz[0];

    // GET LINK
    $xyz = explode('<div class="list-ref">', $str);

    // AT END OF FILE
    if (empty($xyz[1])) break;

    $xyz = explode('</a>', $xyz[1]);
    $link  = trim($xyz[0]) . '</a>';

    // SHOW THE EXTRACTED DATA ELEMENTS
    echo PHP_EOL . 'TITLE: ' . $title;
    echo PHP_EOL . 'PRICE: ' . $price;
    echo PHP_EOL . 'AHREF: ' . htmlentities($link);
    echo PHP_EOL;
}

Open in new window

0
 
NekasasAuthor Commented:
I am very thankfully and it will be wonderfull if you can write to how to get another page. I mean same website but i will get info phone from all page. I need pagination too
0
 
Ray PaseurCommented:
... wonderfull if you can write to how to get another page.
That seems like a separate question, but in reality it's not a "question" so much as a requirement for application development.  You should consider hiring a professional application developer if you want to pursue this line of business.  A better approach would be to contact the web publisher and ask them to expose an API that gives you the data.  That will lead to a better web application and it will also keep you out of legal trouble!

Thanks for the points and thanks for using EE, ~Ray
0
 
NekasasAuthor Commented:
I am student and I need this to my bachelor work, so professional application developer must be I or page's like this or stackoverflow
0
 
NekasasAuthor Commented:
Sorry for duplicate comment, but I want to ask about your code. How about cutting link that I will get ony link without title. You are using this
$xyz = explode('<sup', $xyz[1]);"

Open in new window

to show from where to start, but how to show where to stop?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now