Solved

php dom html parser divv

Posted on 2012-12-26
10
468 Views
Last Modified: 2012-12-26
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿General Music4</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music3</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<.div>
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name"> General Music2</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
<li class="podcast"><a target="_blank" href="http://www.localhost.com/podcast/bar.xml">podcast</a></li>

</ul>
<.div>

Open in new window


i want to parse this one to get the name div class for example
programe : listing thu_programs

name : genral music
time  12:00 pm - 6:00 am
podcast : url

name : genral music
time  12:00 pm - 6:00 am

for each div.. ???

and if a ul have li with class poad cast get it..


<?php


   
    //    header('Content-Type: text/html; charset=utf-8');
  

            $str = '<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
 
<ul style="background-position:left -1157px;">
<li class="name">¿¿ ¿ ¿¿¿ </li>
<li class="time">2:00 pm - 12:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/bar.xml">podcast</a></li>
</ul>
<ul style="background-position:left -882px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">3:00 pm - 2:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tweet.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 pm - 3:00 pm</li>
</ul>
 
<ul style="background-position:left -1432px;">
<li class="name">Mixtation</li>
<li class="time">5:00 pm - 4:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/mixtation.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 pm - 5:00 pm</li>
</ul>
<ul style="background-position:left -550px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time">7:00 pm - 6:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tech.xml">podcast</a></li>
</ul>
<ul style="background-position:left -1046px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/cars.xml">podcast</a></li>
</ul>
<ul style="background-position:left -716px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">9:00 pm - 8:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/east.xml">podcast</a></li>
</ul>
<ul style="background-position:left -936px;">
<li class="name">¿¿¿ ¿¿¿ ¿¿¿</li>
<li class="time">10:30 pm - 9:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hiphop.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 am - 10:30 pm</li>
</ul>
<ul style="background-position:left -1266px;">
<li class="name">Mega Mix</li>
<li class="time">2:00 am - 12:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 2:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
 
<div class="listing fri_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">11:00 am - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1601px;">
<li class="name">¿¿¿¿¿ ¿¿¿¿ Religious Content</li>
<li class="time">1:00 pm - 11:00 am</li>
</ul>
<ul style="background-position:left -1100px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time" style="direction:rtl;"><span>¿¿¿ ¿¿¿¿ ¿¿¿¿¿¿¿ </span>2:00pm - 1:00pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/yaddak.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">5:00 pm - 2:00 pm</li>
</ul>
<ul style="background-position:left -1376px;">
<li class="name">Heba Show</li>
<li class="time">7:00 pm - 5:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/heba.xml">podcast</a></li>
</ul>
<ul style="background-position:left -770px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hindi.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">10:00 pm - 8:00 pm</li>
</ul>
<ul style="background-position:left -441px;">
<li class="name">Room 11</li>
<li class="time">10:30 pm - 10:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/room11.xml">podcast</a></li>
</ul>
<ul style="background-position:left -606px;">
<li class="name">¿¿¿¿¿ ¿¿¿ ¿¿¿¿¿¿</li>
<li class="time">12:00 am - 10:30 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/saltana.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 12:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
</html>
';

 $DOM = new DOMDocument;
   $DOM->loadHTML($str);

   //get all H1
   $items = $DOM->getElementsByTagName('div');



         for ($i = 0; $i < $items->length; $i++) {
              $linkthumb = $items->item($i)->getAttribute('class');
    echo "<ul>".$linkthumb."</ul>";
   //  var_dump($items->item($i));

   $children = $items->item($i)->childNodes;
            for($j=0;$j<$children->length;$j++) {



               echo "<ul>".$children->item($j)->nodeValue."</ul>";
         }

         
 
     
     
     
     
                   }

Open in new window

0
Comment
Question by:afifosh
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 38721120
I tried to run this script: http://www.laprbass.com/RAY_temp_afifosh.php

Please post a link to the original document that you want to parse. I think we will need valid UTF-8 if we are going to make progress here.  Thanks, ~Ray
0
 
LVL 1

Author Comment

by:afifosh
ID: 38721137
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 38721172
Is that the entire original document?  It looks like just a list of UL tags.  I was expecting to see something more like the variable $str from the original code snippet.
0
More Than Just A Video Library

Train for your certification. Learn the latest DevOps tools. Grow your skillset to do better work.

At Linux Academy, we release new training modules every week so you'll always be up to date on the latest tech.

 
LVL 1

Author Comment

by:afifosh
ID: 38721217
i have many div each div if Weekday reference and each div has many child ul
and each ul contains program detail like name time
to able to insert in database
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 38721281
Let me try this again:
Please post a link to the original document that you want to parse.
Thanks, ~Ray
0
 
LVL 1

Author Comment

by:afifosh
ID: 38721288
http://www.mixfm-sa.com/v08/schedule.html

i want to parse all program to able to insert in database
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 38721363
It may be easier to get the data if we step away from the PHP DOM class, and instead scrape the HTML document.  I'll try to show you how we might do that, but it will take a little while to test the code.
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 38721601
Have a look at this.  It creates an array of simple, data-only objects containing the contents of each of the UL list items.  You can iterate over the array, and use OOP notation to access each object.  One INSERT query per object, and your data base is built!
http://www.laprbass.com/RAY_temp_afifosh.php
<?php // RAY_temp_afifosh.php
error_reporting(E_ALL);

// A SIMPLE CLASS TO HOLD THE PROGRAMMING INFORMATION
Class Program
{
    public $day, $name, $time, $podcast;
    public function __construct($day, $name, $time, $podcast)
    {
        $this->day     = $day;
        $this->name    = $name;
        $this->time    = $time;
        $this->podcast = $podcast;
    }
}

// A FUNCTION FOR DATA EXTRACTION
function pluck($str, $tag, $end='</li>')
{
    $str = trim($str);
    $poz = strpos($str, $tag);
    if ($poz === FALSE) return NULL;
    $poz = $poz + strlen($tag);
    $str = substr($str, $poz);
    $str = explode($end, $str);
    return $str[0];
}

// THE SIGNAL STRINGS
$n = '<li class="name">';
$t = '<li class="time">';
$p = '<li class="podcast">';

// AN ARRAY TO HOLD THE PROGRAM OBJECTS
$programs = array();

// HELPERS TO MAKE THE OUTPUT EASIER TO READ
echo '<meta charset="utf-8" />';
echo '<pre>';

// ACQUIRE THE DOCUMENT
$url = 'http://www.mixfm-sa.com/v08/schedule.html';
$str = file_get_contents($url);

// DISCARD THE UNWANTED FOOTER
$arr = explode('</article>', $str);
$str = $arr[0];

// BREAK THE HTML ON THIS DIV
$arr = explode('<div class="listing', $str);

// DISCARD THE UNWANTED HEADER
unset($arr[0]);

// THIS IS AN ARRAY OF 7 ELEMENTS, ONE FOR EACH DAY OF THE WEEK
foreach ($arr as $pgm)
{
    $pgm = trim($pgm);
    $day = substr($pgm,0,3);
    $pgm = strip_tags($pgm, '<ul><li><a>');
    $pgm = substr($pgm, strpos($pgm, '>')+1, strlen($pgm));

    // THIS IS AN ARRAY OF 'N' ELEMENTS, ONE FOR EACH PROGRAM
    $new = explode('</ul>', $pgm);
    foreach ($new as $set)
    {
        // SKIP THE EMPTY ELEMENTS
        $set = trim($set);
        if (empty($set)) continue;

        // ADD A NEW PROGRAM OBJECT TO THE ARRAY
        $name = pluck($set, $n);
        $time = pluck($set, $t);
        $cast = pluck($set, $p);
        $programs[] = new Program($day, $name, $time, $cast);
    }
}
// SHOW THE ARRAY OF PROGRAM OBJECTS
print_r($programs);

Open in new window

HTH, ~Ray
0
 
LVL 1

Author Closing Comment

by:afifosh
ID: 38721630
Mr. Ray ! thank you a lottttttttttttttttttttttttttttttttttttttt :D
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 38721855
Thanks for the points, and thanks for using EE, ~Ray
0

Featured Post

The Orion Papers

Are you interested in becoming an AWS Certified Solutions Architect?

Discover a new interactive way of training for the exam.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
This article discusses how to implement server side field validation and display customized error messages to the client.
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
The viewer will learn how to count occurrences of each item in an array.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question