[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 494
  • Last Modified:

php dom html parser divv

<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿General Music4</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music3</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<.div>
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name"> General Music2</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
<li class="podcast"><a target="_blank" href="http://www.localhost.com/podcast/bar.xml">podcast</a></li>

</ul>
<.div>

Open in new window


i want to parse this one to get the name div class for example
programe : listing thu_programs

name : genral music
time  12:00 pm - 6:00 am
podcast : url

name : genral music
time  12:00 pm - 6:00 am

for each div.. ???

and if a ul have li with class poad cast get it..


<?php


   
    //    header('Content-Type: text/html; charset=utf-8');
  

            $str = '<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
 
<ul style="background-position:left -1157px;">
<li class="name">¿¿ ¿ ¿¿¿ </li>
<li class="time">2:00 pm - 12:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/bar.xml">podcast</a></li>
</ul>
<ul style="background-position:left -882px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">3:00 pm - 2:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tweet.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 pm - 3:00 pm</li>
</ul>
 
<ul style="background-position:left -1432px;">
<li class="name">Mixtation</li>
<li class="time">5:00 pm - 4:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/mixtation.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 pm - 5:00 pm</li>
</ul>
<ul style="background-position:left -550px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time">7:00 pm - 6:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tech.xml">podcast</a></li>
</ul>
<ul style="background-position:left -1046px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/cars.xml">podcast</a></li>
</ul>
<ul style="background-position:left -716px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">9:00 pm - 8:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/east.xml">podcast</a></li>
</ul>
<ul style="background-position:left -936px;">
<li class="name">¿¿¿ ¿¿¿ ¿¿¿</li>
<li class="time">10:30 pm - 9:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hiphop.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 am - 10:30 pm</li>
</ul>
<ul style="background-position:left -1266px;">
<li class="name">Mega Mix</li>
<li class="time">2:00 am - 12:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 2:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
 
<div class="listing fri_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">11:00 am - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1601px;">
<li class="name">¿¿¿¿¿ ¿¿¿¿ Religious Content</li>
<li class="time">1:00 pm - 11:00 am</li>
</ul>
<ul style="background-position:left -1100px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time" style="direction:rtl;"><span>¿¿¿ ¿¿¿¿ ¿¿¿¿¿¿¿ </span>2:00pm - 1:00pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/yaddak.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">5:00 pm - 2:00 pm</li>
</ul>
<ul style="background-position:left -1376px;">
<li class="name">Heba Show</li>
<li class="time">7:00 pm - 5:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/heba.xml">podcast</a></li>
</ul>
<ul style="background-position:left -770px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hindi.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">10:00 pm - 8:00 pm</li>
</ul>
<ul style="background-position:left -441px;">
<li class="name">Room 11</li>
<li class="time">10:30 pm - 10:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/room11.xml">podcast</a></li>
</ul>
<ul style="background-position:left -606px;">
<li class="name">¿¿¿¿¿ ¿¿¿ ¿¿¿¿¿¿</li>
<li class="time">12:00 am - 10:30 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/saltana.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 12:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
</html>
';

 $DOM = new DOMDocument;
   $DOM->loadHTML($str);

   //get all H1
   $items = $DOM->getElementsByTagName('div');



         for ($i = 0; $i < $items->length; $i++) {
              $linkthumb = $items->item($i)->getAttribute('class');
    echo "<ul>".$linkthumb."</ul>";
   //  var_dump($items->item($i));

   $children = $items->item($i)->childNodes;
            for($j=0;$j<$children->length;$j++) {



               echo "<ul>".$children->item($j)->nodeValue."</ul>";
         }

         
 
     
     
     
     
                   }

Open in new window

0
afifosh
Asked:
afifosh
  • 6
  • 4
1 Solution
 
Ray PaseurCommented:
I tried to run this script: http://www.laprbass.com/RAY_temp_afifosh.php

Please post a link to the original document that you want to parse. I think we will need valid UTF-8 if we are going to make progress here.  Thanks, ~Ray
0
 
afifoshAuthor Commented:
0
 
Ray PaseurCommented:
Is that the entire original document?  It looks like just a list of UL tags.  I was expecting to see something more like the variable $str from the original code snippet.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
afifoshAuthor Commented:
i have many div each div if Weekday reference and each div has many child ul
and each ul contains program detail like name time
to able to insert in database
0
 
Ray PaseurCommented:
Let me try this again:
Please post a link to the original document that you want to parse.
Thanks, ~Ray
0
 
afifoshAuthor Commented:
http://www.mixfm-sa.com/v08/schedule.html

i want to parse all program to able to insert in database
0
 
Ray PaseurCommented:
It may be easier to get the data if we step away from the PHP DOM class, and instead scrape the HTML document.  I'll try to show you how we might do that, but it will take a little while to test the code.
0
 
Ray PaseurCommented:
Have a look at this.  It creates an array of simple, data-only objects containing the contents of each of the UL list items.  You can iterate over the array, and use OOP notation to access each object.  One INSERT query per object, and your data base is built!
http://www.laprbass.com/RAY_temp_afifosh.php
<?php // RAY_temp_afifosh.php
error_reporting(E_ALL);

// A SIMPLE CLASS TO HOLD THE PROGRAMMING INFORMATION
Class Program
{
    public $day, $name, $time, $podcast;
    public function __construct($day, $name, $time, $podcast)
    {
        $this->day     = $day;
        $this->name    = $name;
        $this->time    = $time;
        $this->podcast = $podcast;
    }
}

// A FUNCTION FOR DATA EXTRACTION
function pluck($str, $tag, $end='</li>')
{
    $str = trim($str);
    $poz = strpos($str, $tag);
    if ($poz === FALSE) return NULL;
    $poz = $poz + strlen($tag);
    $str = substr($str, $poz);
    $str = explode($end, $str);
    return $str[0];
}

// THE SIGNAL STRINGS
$n = '<li class="name">';
$t = '<li class="time">';
$p = '<li class="podcast">';

// AN ARRAY TO HOLD THE PROGRAM OBJECTS
$programs = array();

// HELPERS TO MAKE THE OUTPUT EASIER TO READ
echo '<meta charset="utf-8" />';
echo '<pre>';

// ACQUIRE THE DOCUMENT
$url = 'http://www.mixfm-sa.com/v08/schedule.html';
$str = file_get_contents($url);

// DISCARD THE UNWANTED FOOTER
$arr = explode('</article>', $str);
$str = $arr[0];

// BREAK THE HTML ON THIS DIV
$arr = explode('<div class="listing', $str);

// DISCARD THE UNWANTED HEADER
unset($arr[0]);

// THIS IS AN ARRAY OF 7 ELEMENTS, ONE FOR EACH DAY OF THE WEEK
foreach ($arr as $pgm)
{
    $pgm = trim($pgm);
    $day = substr($pgm,0,3);
    $pgm = strip_tags($pgm, '<ul><li><a>');
    $pgm = substr($pgm, strpos($pgm, '>')+1, strlen($pgm));

    // THIS IS AN ARRAY OF 'N' ELEMENTS, ONE FOR EACH PROGRAM
    $new = explode('</ul>', $pgm);
    foreach ($new as $set)
    {
        // SKIP THE EMPTY ELEMENTS
        $set = trim($set);
        if (empty($set)) continue;

        // ADD A NEW PROGRAM OBJECT TO THE ARRAY
        $name = pluck($set, $n);
        $time = pluck($set, $t);
        $cast = pluck($set, $p);
        $programs[] = new Program($day, $name, $time, $cast);
    }
}
// SHOW THE ARRAY OF PROGRAM OBJECTS
print_r($programs);

Open in new window

HTH, ~Ray
0
 
afifoshAuthor Commented:
Mr. Ray ! thank you a lottttttttttttttttttttttttttttttttttttttt :D
0
 
Ray PaseurCommented:
Thanks for the points, and thanks for using EE, ~Ray
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now