Solved

How can I extract parts from website with PHP

Posted on 2015-01-21
2
178 Views
Last Modified: 2015-01-22
Hi guys,

I need to get some data once or twice a week from different websites . I usually do this by hand but these websites have way too much information displayed so I thought it would be easier to make a script that runs with a cron job that fetches me exactly what I want and list them for me on a PHP page. I did something alike like 10 years ago but my coding skills are quite rusted so I would appreciate a hand here.

One of the sites list the information like this:

<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc"> 
        <tr> 
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0"> 
              <tr>  
                <td colspan="3">  
                    <p align="center">   
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">  
                      
                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">  
                      
                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">  
                      
                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">  
                     
                    </p> 
                </td> 
              </tr>
......

Open in new window


What I would like to get is something like this:
Item 1 - 49 - 49.png
Item 2 - 118 - 118.png
Item 3 - 491- 491.png
Item 4 - 24- 24.png

or something alike in an array or something so I can even make a DB out from this. How can be done?

Thanks in advance!
0
Comment
Question by:Cesar Aracena
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 40563356
Might be helpful for us to see this "in situ."  What is the URL of the site?

Something like this should work.
http://iconoun.com/demo/temp_caracena.php

<?php // demo/temp_caracena.php

/**
 * See: http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28601108.html
 *
 *  Item 1 - 49 - 49.png
 *
 *  Item 2 - 118 - 118.png
 *
 *  Item 3 - 491- 491.png
 *
 *  Item 4 - 24- 24.png
 *
 * or something alike in an array or something so I can even make a DB out from this. How can be done?
 *
 */

error_reporting(E_ALL);
echo '<pre>';

// THE TEST DATA
$htm = <<<EOD
<table width="100%" border="1" cellpadding="0" cellspacing="0" bordercolor="#cccccc">
        <tr>
          <td><table width="100%" border="0" cellpadding="1" cellspacing="2" bordercolor="0">
              <tr>
                <td colspan="3">
                    <p align="center">
                    <img src="/images/classes/49.png" alt="Item 1 Number 49" width="40" height="40">

                    <img src="/images/classes/118.png" alt="Item 2 Number 118" width="40" height="40">

                    <img src="/images/classes/491.png" alt="Item 3 Number 491" width="40" height="40">

                    <img src="/images/classes/24.png" alt="Item 4 Number 24" width="40" height="40">

                    </p>
                </td>
              </tr>
......
EOD;

// GET IMAGE NAMES
$rgx
= '#'         // REGEX DELIMITER
. 'ses/'      // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $images);

// GET ALT TEXT STRINGS
$rgx
= '#'         // REGEX DELIMITER
. 'alt="'     // LITERAL STRING
. '('         // CAPTURE GROUP
. '.*?'       // ANYTHING OR NOTHING
. ')'         // END CAPTURE GROUP
. '"'         // LITERAL STRING
. '#'         // REGEX DELIMITER
;
preg_match_all($rgx, $htm, $alts);

// PREPARE THE OUTPUT
$out = array();
foreach ($alts[1] as $key => $val)
{
    $val = str_replace('Number', '-', $val);

    $out[$key] = $val . ' - ' . $images[1][$key];
}

// SHOW THE WORK PRODUCT
print_r($out);

Open in new window

0
 
LVL 6

Author Comment

by:Cesar Aracena
ID: 40564108
`Hi Ray, thanks for the answer. It works perfectly! Unfortunately the sites I fetch the data from are very private university (research) sites.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
CSS style formatting? 2 35
CSS for <center> 14 35
add if statement powershell 8 24
How do I Import CSV File In my PHP Application 28 14
Is your Office 365 signature not working the way you want it to? Are signature updates taking up too much of your time? Let's run through the most common problems that an IT administrator can encounter when dealing with Office 365 email signatures.
In threads here at EE, each comment has a unique Identifier (ID). It is easy to get the full path for an ID via the right-click context menu. However, we often want to post a short link within a thread rather than the full link. This article shows a…
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question