?
Solved

How can I extract parts from website with PHP

Posted on 2015-01-21
2
Medium Priority
?
182 Views
Last Modified: 2015-01-22
Hi guys,

I need to get some data once or twice a week from different websites . I usually do this by hand but these websites have way too much information displayed so I thought it would be easier to make a script that runs with a cron job that fetches me exactly what I want and list them for me on a PHP page. I did something alike like 10 years ago but my coding skills are quite rusted so I would appreciate a hand here.

One of the sites list the information like this:

<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc"> 
        <tr> 
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0"> 
              <tr>  
                <td colspan="3">  
                    <p align="center">   
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">  
                      
                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">  
                      
                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">  
                      
                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">  
                     
                    </p> 
                </td> 
              </tr>
......

Open in new window


What I would like to get is something like this:
Item 1 - 49 - 49.png
Item 2 - 118 - 118.png
Item 3 - 491- 491.png
Item 4 - 24- 24.png

or something alike in an array or something so I can even make a DB out from this. How can be done?

Thanks in advance!
0
Comment
Question by:Cesar Aracena
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 40563356
Might be helpful for us to see this "in situ."  What is the URL of the site?

Something like this should work.
http://iconoun.com/demo/temp_caracena.php

<?php // demo/temp_caracena.php

/**
 * See: http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28601108.html
 *
 *  Item 1 - 49 - 49.png
 *
 *  Item 2 - 118 - 118.png
 *
 *  Item 3 - 491- 491.png
 *
 *  Item 4 - 24- 24.png
 *
 * or something alike in an array or something so I can even make a DB out from this. How can be done?
 *
 */

error_reporting(E_ALL);
echo '<pre>';

// THE TEST DATA
$htm = <<<EOD
<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc">
        <tr>
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0">
              <tr>
                <td colspan="3">
                    <p align="center">
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">

                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">

                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">

                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">

                    </p>
                </td>
              </tr>
......
EOD;

// GET IMAGE NAMES
$rgx
= '#'         // REGEX DELIMITER
. 'ses/'      // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $images);

// GET ALT TEXT STRINGS
$rgx
= '#'         // REGEX DELIMITER
. 'alt="'     // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $alts);

// PREPARE THE OUTPUT
$out = array();
foreach ($alts[1] as $key => $val)
{
    $val = str_replace('Number', '-', $val);

    $out[$key] = $val . ' - ' . $images[1][$key];
}

// SHOW THE WORK PRODUCT
print_r($out);

Open in new window

0
 
LVL 6

Author Comment

by:Cesar Aracena
ID: 40564108
`Hi Ray, thanks for the answer. It works perfectly! Unfortunately the sites I fetch the data from are very private university (research) sites.
0

Featured Post

More Than Just A Video Library

Train for your certification. Learn the latest DevOps tools. Grow your skillset to do better work.

At Linux Academy, we release new training modules every week so you'll always be up to date on the latest tech.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
This article discusses how to implement server side field validation and display customized error messages to the client.
The viewer will learn how to count occurrences of each item in an array.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Suggested Courses

741 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question