Solved

How can I extract parts from website with PHP

Posted on 2015-01-21
2
177 Views
Last Modified: 2015-01-22
Hi guys,

I need to get some data once or twice a week from different websites . I usually do this by hand but these websites have way too much information displayed so I thought it would be easier to make a script that runs with a cron job that fetches me exactly what I want and list them for me on a PHP page. I did something alike like 10 years ago but my coding skills are quite rusted so I would appreciate a hand here.

One of the sites list the information like this:

<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc"> 
        <tr> 
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0"> 
              <tr>  
                <td colspan="3">  
                    <p align="center">   
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">  
                      
                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">  
                      
                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">  
                      
                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">  
                     
                    </p> 
                </td> 
              </tr>
......

Open in new window


What I would like to get is something like this:
Item 1 - 49 - 49.png
Item 2 - 118 - 118.png
Item 3 - 491- 491.png
Item 4 - 24- 24.png

or something alike in an array or something so I can even make a DB out from this. How can be done?

Thanks in advance!
0
Comment
Question by:Cesar Aracena
2 Comments
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 40563356
Might be helpful for us to see this "in situ."  What is the URL of the site?

Something like this should work.
http://iconoun.com/demo/temp_caracena.php

<?php // demo/temp_caracena.php

/**
 * See: http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28601108.html
 *
 *  Item 1 - 49 - 49.png
 *
 *  Item 2 - 118 - 118.png
 *
 *  Item 3 - 491- 491.png
 *
 *  Item 4 - 24- 24.png
 *
 * or something alike in an array or something so I can even make a DB out from this. How can be done?
 *
 */

error_reporting(E_ALL);
echo '<pre>';

// THE TEST DATA
$htm = <<<EOD
<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc">
        <tr>
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0">
              <tr>
                <td colspan="3">
                    <p align="center">
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">

                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">

                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">

                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">

                    </p>
                </td>
              </tr>
......
EOD;

// GET IMAGE NAMES
$rgx
= '#'         // REGEX DELIMITER
. 'ses/'      // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $images);

// GET ALT TEXT STRINGS
$rgx
= '#'         // REGEX DELIMITER
. 'alt="'     // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $alts);

// PREPARE THE OUTPUT
$out = array();
foreach ($alts[1] as $key => $val)
{
    $val = str_replace('Number', '-', $val);

    $out[$key] = $val . ' - ' . $images[1][$key];
}

// SHOW THE WORK PRODUCT
print_r($out);

Open in new window

0
 
LVL 6

Author Comment

by:Cesar Aracena
ID: 40564108
`Hi Ray, thanks for the answer. It works perfectly! Unfortunately the sites I fetch the data from are very private university (research) sites.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question