How can I extract parts from website with PHP

Hi guys,

I need to get some data once or twice a week from different websites . I usually do this by hand but these websites have way too much information displayed so I thought it would be easier to make a script that runs with a cron job that fetches me exactly what I want and list them for me on a PHP page. I did something alike like 10 years ago but my coding skills are quite rusted so I would appreciate a hand here.

One of the sites list the information like this:

<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc"> 
        <tr> 
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0"> 
              <tr>  
                <td colspan="3">  
                    <p align="center">   
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">  
                      
                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">  
                      
                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">  
                      
                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">  
                     
                    </p> 
                </td> 
              </tr>
......

Open in new window


What I would like to get is something like this:
Item 1 - 49 - 49.png
Item 2 - 118 - 118.png
Item 3 - 491- 491.png
Item 4 - 24- 24.png

or something alike in an array or something so I can even make a DB out from this. How can be done?

Thanks in advance!
LVL 6
Cesar AracenaPHP EnthusiastAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Ray PaseurConnect With a Mentor Commented:
Might be helpful for us to see this "in situ."  What is the URL of the site?

Something like this should work.
http://iconoun.com/demo/temp_caracena.php

<?php // demo/temp_caracena.php

/**
 * See: http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28601108.html
 *
 *  Item 1 - 49 - 49.png
 *
 *  Item 2 - 118 - 118.png
 *
 *  Item 3 - 491- 491.png
 *
 *  Item 4 - 24- 24.png
 *
 * or something alike in an array or something so I can even make a DB out from this. How can be done?
 *
 */

error_reporting(E_ALL);
echo '<pre>';

// THE TEST DATA
$htm = <<<EOD
<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc">
        <tr>
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0">
              <tr>
                <td colspan="3">
                    <p align="center">
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">

                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">

                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">

                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">

                    </p>
                </td>
              </tr>
......
EOD;

// GET IMAGE NAMES
$rgx
= '#'         // REGEX DELIMITER
. 'ses/'      // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $images);

// GET ALT TEXT STRINGS
$rgx
= '#'         // REGEX DELIMITER
. 'alt="'     // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $alts);

// PREPARE THE OUTPUT
$out = array();
foreach ($alts[1] as $key => $val)
{
    $val = str_replace('Number', '-', $val);

    $out[$key] = $val . ' - ' . $images[1][$key];
}

// SHOW THE WORK PRODUCT
print_r($out);

Open in new window

0
 
Cesar AracenaPHP EnthusiastAuthor Commented:
`Hi Ray, thanks for the answer. It works perfectly! Unfortunately the sites I fetch the data from are very private university (research) sites.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.