Solved

php dom html parser divv

Posted on 2012-12-26
10
457 Views
Last Modified: 2012-12-26
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿General Music4</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music3</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<.div>
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name"> General Music2</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
<li class="podcast"><a target="_blank" href="http://www.localhost.com/podcast/bar.xml">podcast</a></li>

</ul>
<.div>

Open in new window


i want to parse this one to get the name div class for example
programe : listing thu_programs

name : genral music
time  12:00 pm - 6:00 am
podcast : url

name : genral music
time  12:00 pm - 6:00 am

for each div.. ???

and if a ul have li with class poad cast get it..


<?php


   
    //    header('Content-Type: text/html; charset=utf-8');
  

            $str = '<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
 
<ul style="background-position:left -1157px;">
<li class="name">¿¿ ¿ ¿¿¿ </li>
<li class="time">2:00 pm - 12:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/bar.xml">podcast</a></li>
</ul>
<ul style="background-position:left -882px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">3:00 pm - 2:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tweet.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 pm - 3:00 pm</li>
</ul>
 
<ul style="background-position:left -1432px;">
<li class="name">Mixtation</li>
<li class="time">5:00 pm - 4:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/mixtation.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 pm - 5:00 pm</li>
</ul>
<ul style="background-position:left -550px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time">7:00 pm - 6:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tech.xml">podcast</a></li>
</ul>
<ul style="background-position:left -1046px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/cars.xml">podcast</a></li>
</ul>
<ul style="background-position:left -716px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">9:00 pm - 8:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/east.xml">podcast</a></li>
</ul>
<ul style="background-position:left -936px;">
<li class="name">¿¿¿ ¿¿¿ ¿¿¿</li>
<li class="time">10:30 pm - 9:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hiphop.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 am - 10:30 pm</li>
</ul>
<ul style="background-position:left -1266px;">
<li class="name">Mega Mix</li>
<li class="time">2:00 am - 12:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 2:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
 
<div class="listing fri_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">11:00 am - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1601px;">
<li class="name">¿¿¿¿¿ ¿¿¿¿ Religious Content</li>
<li class="time">1:00 pm - 11:00 am</li>
</ul>
<ul style="background-position:left -1100px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time" style="direction:rtl;"><span>¿¿¿ ¿¿¿¿ ¿¿¿¿¿¿¿ </span>2:00pm - 1:00pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/yaddak.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">5:00 pm - 2:00 pm</li>
</ul>
<ul style="background-position:left -1376px;">
<li class="name">Heba Show</li>
<li class="time">7:00 pm - 5:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/heba.xml">podcast</a></li>
</ul>
<ul style="background-position:left -770px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hindi.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">10:00 pm - 8:00 pm</li>
</ul>
<ul style="background-position:left -441px;">
<li class="name">Room 11</li>
<li class="time">10:30 pm - 10:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/room11.xml">podcast</a></li>
</ul>
<ul style="background-position:left -606px;">
<li class="name">¿¿¿¿¿ ¿¿¿ ¿¿¿¿¿¿</li>
<li class="time">12:00 am - 10:30 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/saltana.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 12:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
</html>
';

 $DOM = new DOMDocument;
   $DOM->loadHTML($str);

   //get all H1
   $items = $DOM->getElementsByTagName('div');



         for ($i = 0; $i < $items->length; $i++) {
              $linkthumb = $items->item($i)->getAttribute('class');
    echo "<ul>".$linkthumb."</ul>";
   //  var_dump($items->item($i));

   $children = $items->item($i)->childNodes;
            for($j=0;$j<$children->length;$j++) {



               echo "<ul>".$children->item($j)->nodeValue."</ul>";
         }

         
 
     
     
     
     
                   }

Open in new window

0
Comment
Question by:afifosh
  • 6
  • 4
10 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 38721120
I tried to run this script: http://www.laprbass.com/RAY_temp_afifosh.php

Please post a link to the original document that you want to parse. I think we will need valid UTF-8 if we are going to make progress here.  Thanks, ~Ray
0
 
LVL 1

Author Comment

by:afifosh
ID: 38721137
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 38721172
Is that the entire original document?  It looks like just a list of UL tags.  I was expecting to see something more like the variable $str from the original code snippet.
0
 
LVL 1

Author Comment

by:afifosh
ID: 38721217
i have many div each div if Weekday reference and each div has many child ul
and each ul contains program detail like name time
to able to insert in database
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 38721281
Let me try this again:
Please post a link to the original document that you want to parse.
Thanks, ~Ray
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 1

Author Comment

by:afifosh
ID: 38721288
http://www.mixfm-sa.com/v08/schedule.html

i want to parse all program to able to insert in database
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 38721363
It may be easier to get the data if we step away from the PHP DOM class, and instead scrape the HTML document.  I'll try to show you how we might do that, but it will take a little while to test the code.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 38721601
Have a look at this.  It creates an array of simple, data-only objects containing the contents of each of the UL list items.  You can iterate over the array, and use OOP notation to access each object.  One INSERT query per object, and your data base is built!
http://www.laprbass.com/RAY_temp_afifosh.php
<?php // RAY_temp_afifosh.php
error_reporting(E_ALL);

// A SIMPLE CLASS TO HOLD THE PROGRAMMING INFORMATION
Class Program
{
    public $day, $name, $time, $podcast;
    public function __construct($day, $name, $time, $podcast)
    {
        $this->day     = $day;
        $this->name    = $name;
        $this->time    = $time;
        $this->podcast = $podcast;
    }
}

// A FUNCTION FOR DATA EXTRACTION
function pluck($str, $tag, $end='</li>')
{
    $str = trim($str);
    $poz = strpos($str, $tag);
    if ($poz === FALSE) return NULL;
    $poz = $poz + strlen($tag);
    $str = substr($str, $poz);
    $str = explode($end, $str);
    return $str[0];
}

// THE SIGNAL STRINGS
$n = '<li class="name">';
$t = '<li class="time">';
$p = '<li class="podcast">';

// AN ARRAY TO HOLD THE PROGRAM OBJECTS
$programs = array();

// HELPERS TO MAKE THE OUTPUT EASIER TO READ
echo '<meta charset="utf-8" />';
echo '<pre>';

// ACQUIRE THE DOCUMENT
$url = 'http://www.mixfm-sa.com/v08/schedule.html';
$str = file_get_contents($url);

// DISCARD THE UNWANTED FOOTER
$arr = explode('</article>', $str);
$str = $arr[0];

// BREAK THE HTML ON THIS DIV
$arr = explode('<div class="listing', $str);

// DISCARD THE UNWANTED HEADER
unset($arr[0]);

// THIS IS AN ARRAY OF 7 ELEMENTS, ONE FOR EACH DAY OF THE WEEK
foreach ($arr as $pgm)
{
    $pgm = trim($pgm);
    $day = substr($pgm,0,3);
    $pgm = strip_tags($pgm, '<ul><li><a>');
    $pgm = substr($pgm, strpos($pgm, '>')+1, strlen($pgm));

    // THIS IS AN ARRAY OF 'N' ELEMENTS, ONE FOR EACH PROGRAM
    $new = explode('</ul>', $pgm);
    foreach ($new as $set)
    {
        // SKIP THE EMPTY ELEMENTS
        $set = trim($set);
        if (empty($set)) continue;

        // ADD A NEW PROGRAM OBJECT TO THE ARRAY
        $name = pluck($set, $n);
        $time = pluck($set, $t);
        $cast = pluck($set, $p);
        $programs[] = new Program($day, $name, $time, $cast);
    }
}
// SHOW THE ARRAY OF PROGRAM OBJECTS
print_r($programs);

Open in new window

HTH, ~Ray
0
 
LVL 1

Author Closing Comment

by:afifosh
ID: 38721630
Mr. Ray ! thank you a lottttttttttttttttttttttttttttttttttttttt :D
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 38721855
Thanks for the points, and thanks for using EE, ~Ray
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An enjoyable and seamless user experience can go a long way on an eCommerce site. While a cohesive layout and engaging copy play roles in creating a positive user experience, some sites neglect aspects that seem marginal but in actuality prove very …
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now