?
Solved

php dom html parser divv

Posted on 2012-12-26
10
Medium Priority
?
474 Views
Last Modified: 2012-12-26
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿General Music4</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music3</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<.div>
<div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name"> General Music2</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
<li class="podcast"><a target="_blank" href="http://www.localhost.com/podcast/bar.xml">podcast</a></li>

</ul>
<.div>

Open in new window


i want to parse this one to get the name div class for example
programe : listing thu_programs

name : genral music
time  12:00 pm - 6:00 am
podcast : url

name : genral music
time  12:00 pm - 6:00 am

for each div.. ???

and if a ul have li with class poad cast get it..


<?php


   
    //    header('Content-Type: text/html; charset=utf-8');
  

            $str = '<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div class="listing thu_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 pm - 6:00 am</li>
</ul>
 
<ul style="background-position:left -1157px;">
<li class="name">¿¿ ¿ ¿¿¿ </li>
<li class="time">2:00 pm - 12:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/bar.xml">podcast</a></li>
</ul>
<ul style="background-position:left -882px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">3:00 pm - 2:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tweet.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 pm - 3:00 pm</li>
</ul>
 
<ul style="background-position:left -1432px;">
<li class="name">Mixtation</li>
<li class="time">5:00 pm - 4:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/mixtation.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 pm - 5:00 pm</li>
</ul>
<ul style="background-position:left -550px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time">7:00 pm - 6:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/tech.xml">podcast</a></li>
</ul>
<ul style="background-position:left -1046px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/cars.xml">podcast</a></li>
</ul>
<ul style="background-position:left -716px;">
<li class="name">¿¿¿ ¿¿¿¿</li>
<li class="time">9:00 pm - 8:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/east.xml">podcast</a></li>
</ul>
<ul style="background-position:left -936px;">
<li class="name">¿¿¿ ¿¿¿ ¿¿¿</li>
<li class="time">10:30 pm - 9:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hiphop.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">12:00 am - 10:30 pm</li>
</ul>
<ul style="background-position:left -1266px;">
<li class="name">Mega Mix</li>
<li class="time">2:00 am - 12:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 2:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
 
<div class="listing fri_programs" >
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">11:00 am - 6:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1601px;">
<li class="name">¿¿¿¿¿ ¿¿¿¿ Religious Content</li>
<li class="time">1:00 pm - 11:00 am</li>
</ul>
<ul style="background-position:left -1100px;">
<li class="name">¿¿¿ ¿¿¿</li>
<li class="time" style="direction:rtl;"><span>¿¿¿ ¿¿¿¿ ¿¿¿¿¿¿¿ </span>2:00pm - 1:00pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/yaddak.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">5:00 pm - 2:00 pm</li>
</ul>
<ul style="background-position:left -1376px;">
<li class="name">Heba Show</li>
<li class="time">7:00 pm - 5:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/heba.xml">podcast</a></li>
</ul>
<ul style="background-position:left -770px;">
<li class="name">¿¿¿¿ ¿¿¿¿</li>
<li class="time">8:00 pm - 7:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/hindi.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">10:00 pm - 8:00 pm</li>
</ul>
<ul style="background-position:left -441px;">
<li class="name">Room 11</li>
<li class="time">10:30 pm - 10:00 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/room11.xml">podcast</a></li>
</ul>
<ul style="background-position:left -606px;">
<li class="name">¿¿¿¿¿ ¿¿¿ ¿¿¿¿¿¿</li>
<li class="time">12:00 am - 10:30 pm</li>
<li class="podcast"><a target="_blank" href="http://www.mixfm-sa.com/podcast/saltana.xml">podcast</a></li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">4:00 am - 12:00 am</li>
</ul>
<ul style="background-position:left -1601px;">
<li class="name">Religious</li>
<li class="time">4:30 am - 4:00 am</li>
</ul>
<ul class="gmusic" style="background-position:left -1544px;">
<li class="name">¿¿¿¿¿¿ ¿¿¿¿¿¿ General Music</li>
<li class="time">6:00 am - 4:30 am</li>
</ul>
</div>
</html>
';

 $DOM = new DOMDocument;
   $DOM->loadHTML($str);

   //get all H1
   $items = $DOM->getElementsByTagName('div');



         for ($i = 0; $i < $items->length; $i++) {
              $linkthumb = $items->item($i)->getAttribute('class');
    echo "<ul>".$linkthumb."</ul>";
   //  var_dump($items->item($i));

   $children = $items->item($i)->childNodes;
            for($j=0;$j<$children->length;$j++) {



               echo "<ul>".$children->item($j)->nodeValue."</ul>";
         }

         
 
     
     
     
     
                   }

Open in new window

0
Comment
Question by:afifosh
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38721120
I tried to run this script: http://www.laprbass.com/RAY_temp_afifosh.php

Please post a link to the original document that you want to parse. I think we will need valid UTF-8 if we are going to make progress here.  Thanks, ~Ray
0
 
LVL 1

Author Comment

by:afifosh
ID: 38721137
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38721172
Is that the entire original document?  It looks like just a list of UL tags.  I was expecting to see something more like the variable $str from the original code snippet.
0
WordPress Tutorial 4: Recommended Plugins

Now that you have WordPress installed, understand the interface, and know how to install new parts, let’s take a look at our recommended plugins.

 
LVL 1

Author Comment

by:afifosh
ID: 38721217
i have many div each div if Weekday reference and each div has many child ul
and each ul contains program detail like name time
to able to insert in database
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38721281
Let me try this again:
Please post a link to the original document that you want to parse.
Thanks, ~Ray
0
 
LVL 1

Author Comment

by:afifosh
ID: 38721288
http://www.mixfm-sa.com/v08/schedule.html

i want to parse all program to able to insert in database
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38721363
It may be easier to get the data if we step away from the PHP DOM class, and instead scrape the HTML document.  I'll try to show you how we might do that, but it will take a little while to test the code.
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 38721601
Have a look at this.  It creates an array of simple, data-only objects containing the contents of each of the UL list items.  You can iterate over the array, and use OOP notation to access each object.  One INSERT query per object, and your data base is built!
http://www.laprbass.com/RAY_temp_afifosh.php
<?php // RAY_temp_afifosh.php
error_reporting(E_ALL);

// A SIMPLE CLASS TO HOLD THE PROGRAMMING INFORMATION
Class Program
{
    public $day, $name, $time, $podcast;
    public function __construct($day, $name, $time, $podcast)
    {
        $this->day     = $day;
        $this->name    = $name;
        $this->time    = $time;
        $this->podcast = $podcast;
    }
}

// A FUNCTION FOR DATA EXTRACTION
function pluck($str, $tag, $end='</li>')
{
    $str = trim($str);
    $poz = strpos($str, $tag);
    if ($poz === FALSE) return NULL;
    $poz = $poz + strlen($tag);
    $str = substr($str, $poz);
    $str = explode($end, $str);
    return $str[0];
}

// THE SIGNAL STRINGS
$n = '<li class="name">';
$t = '<li class="time">';
$p = '<li class="podcast">';

// AN ARRAY TO HOLD THE PROGRAM OBJECTS
$programs = array();

// HELPERS TO MAKE THE OUTPUT EASIER TO READ
echo '<meta charset="utf-8" />';
echo '<pre>';

// ACQUIRE THE DOCUMENT
$url = 'http://www.mixfm-sa.com/v08/schedule.html';
$str = file_get_contents($url);

// DISCARD THE UNWANTED FOOTER
$arr = explode('</article>', $str);
$str = $arr[0];

// BREAK THE HTML ON THIS DIV
$arr = explode('<div class="listing', $str);

// DISCARD THE UNWANTED HEADER
unset($arr[0]);

// THIS IS AN ARRAY OF 7 ELEMENTS, ONE FOR EACH DAY OF THE WEEK
foreach ($arr as $pgm)
{
    $pgm = trim($pgm);
    $day = substr($pgm,0,3);
    $pgm = strip_tags($pgm, '<ul><li><a>');
    $pgm = substr($pgm, strpos($pgm, '>')+1, strlen($pgm));

    // THIS IS AN ARRAY OF 'N' ELEMENTS, ONE FOR EACH PROGRAM
    $new = explode('</ul>', $pgm);
    foreach ($new as $set)
    {
        // SKIP THE EMPTY ELEMENTS
        $set = trim($set);
        if (empty($set)) continue;

        // ADD A NEW PROGRAM OBJECT TO THE ARRAY
        $name = pluck($set, $n);
        $time = pluck($set, $t);
        $cast = pluck($set, $p);
        $programs[] = new Program($day, $name, $time, $cast);
    }
}
// SHOW THE ARRAY OF PROGRAM OBJECTS
print_r($programs);

Open in new window

HTH, ~Ray
0
 
LVL 1

Author Closing Comment

by:afifosh
ID: 38721630
Mr. Ray ! thank you a lottttttttttttttttttttttttttttttttttttttt :D
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38721855
Thanks for the points, and thanks for using EE, ~Ray
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When crafting your “Why Us” page, there are a plethora of pitfalls to avoid. Follow these five tips, and you’ll be well on your way to creating an effective page.
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question