?
Solved

Getting product block with simple html dom parser

Posted on 2014-04-29
5
Medium Priority
?
1,271 Views
Last Modified: 2014-05-01
I try to to scrap some information from web, but I am stuck with getting product block. Source off web you can find [here](/www.topocentras.lt/Telefonai-Navigacijos/Ismanieji-telefonai-GSM/)

My code is looking like this

    <h2>Telefonai topocentras</h2>
     </br>
    <?php
     include_once('simple_html_dom.php');
     $url = "http://www.topocentras.lt/Telefonai-Navigacijos/Ismanieji-telefonai-/";
     // Start from the main page
      $nextLink = $url;
     // Loop on each next Link as long as it exsists
     while ($nextLink) {
     echo "<hr>nextLink: $nextLink<br>";
     //Create a DOM object
     $html = new simple_html_dom();
     // Load HTML from a url
     $html->load_file($nextLink);
     //Try to find phone block
     $phones = $html->find('li#product-picture img[src]');
      
      foreach($phones as $phone) {
      //Try to find phone cost
        $cost = $phone->find('strong[class=price]', 0)->plaintext;
         //Try to find phone link
        $link = $phone->href;
         //Try to find phone name
        $name=$phone->find('li[a=title]',0)->plaintext;
         //Try to find phonGSMe photo source
        $photo= $phone->find('img[src]',0);
        echo $name, " #----# ", $cost, " #----# ", $link, " #----# ", $photo, "<br>";
        }
      $nextLink = ( ($temp = $html->find('a.href span[=""]', 0)) ?    "https://www.topocentras.lt".$temp->href : NULL );
    // Clear DOM object
    $html->clear();
    unset($html);
     }
     ?>

So problem is to get block of phone. I need all phone name, cost, price, link. I tried a lot of varies, but nothing is working. Maybe someone can tell how correctly get phone block?
0
Comment
Question by:Nekasas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 40029958
Please see http://iconoun.com/demo/temp_nekasas.php

Simple_HTML_DOM is only one way to scrape the document.  I find it easier to use conventional PHP instructions, rather than the object.

I am going to recommend that you do not do this at all.  If the publisher wants to share this information with you in a programmatic manner, they will expose an API that will give you structured data.  If you depend on scraping a web page to get the data you will find that your application is very brittle - it will break as soon as the publisher changes the format or tags inside the document.  And it follows that if the publisher discovers that you're using automation to capture their data, and the publisher does not want you to have this information from web scraping, it will be very easy for them to break your script without notice.

<?php // demo/temp_nekasas.php
error_reporting(E_ALL);

// SEE http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28422236.html

// THE HTML DOCUMENT
$htm = file_get_contents('http://www.topocentras.lt/Telefonai-Navigacijos/Ismanieji-telefonai-GSM/');

// FIRST SIGNAL STRING
$ssa = <<<EOD
<div class="pages">
EOD;

// LAST SIGNAL STRING
$ssz = <<<EOD
<div class="pages bottom-locator">
EOD;

// PRODUCT SEPARATORS
$ssp = <<<EOD
<div class="product-links clear">
EOD;

// MINIMIZE WHITESPACE
$htm = preg_replace('/\s\s+/', ' ', $htm);

// USE SIGNAL STRINGS TO DISCARD UNWANTED PARTS OF THE HTML DOCUMENT
$arr = explode($ssa, $htm);
$arr = explode($ssz, $arr[1]);

// LOCATE THE PRODUCTS DETAIL
$arr = explode($ssp, $arr[0]);
unset($arr[0]);

echo '<pre>';

foreach ($arr as $str)
{
    // GET TITLE
    $xyz = explode('title="', $str);
    $xyz = explode('">', $xyz[1]);
    $title = $xyz[0];

    // GET PRICE
    $xyz = explode('<strong class="price">', $str);
    $xyz = explode('<sup', $xyz[1]);
    $price = $xyz[0];

    // GET LINK
    $xyz = explode('<div class="list-ref">', $str);

    // AT END OF FILE
    if (empty($xyz[1])) break;

    $xyz = explode('</a>', $xyz[1]);
    $link  = trim($xyz[0]) . '</a>';

    // SHOW THE EXTRACTED DATA ELEMENTS
    echo PHP_EOL . 'TITLE: ' . $title;
    echo PHP_EOL . 'PRICE: ' . $price;
    echo PHP_EOL . 'AHREF: ' . htmlentities($link);
    echo PHP_EOL;
}

Open in new window

0
 

Author Comment

by:Nekasas
ID: 40034134
I am very thankfully and it will be wonderfull if you can write to how to get another page. I mean same website but i will get info phone from all page. I need pagination too
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40034277
... wonderfull if you can write to how to get another page.
That seems like a separate question, but in reality it's not a "question" so much as a requirement for application development.  You should consider hiring a professional application developer if you want to pursue this line of business.  A better approach would be to contact the web publisher and ask them to expose an API that gives you the data.  That will lead to a better web application and it will also keep you out of legal trouble!

Thanks for the points and thanks for using EE, ~Ray
0
 

Author Comment

by:Nekasas
ID: 40034291
I am student and I need this to my bachelor work, so professional application developer must be I or page's like this or stackoverflow
0
 

Author Comment

by:Nekasas
ID: 40034316
Sorry for duplicate comment, but I want to ask about your code. How about cutting link that I will get ony link without title. You are using this
$xyz = explode('<sup', $xyz[1]);"

Open in new window

to show from where to start, but how to show where to stop?
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

741 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question