asked on

Getting content inside DIV with dynamic class name and ID.

The attached function gets the content inside the defined DIV. This function works just perfect.

The defined DIV here is:

$str = '<div class="post-333 post hentry category-libros" id="post-333">';

When I define the DIV, I want to ignore "333 post hentry category-libros" id="post-333" part. So, I want to get the content inside the DIV starting like "<div class="post-" ... ignoring anything after "post-"

I mean to get the content inside the div with name starting with class name "post-" and ignoring the rest of name and id.

Please help.
Thank you

function get_content ($url) {
// FIND ALL OF THE DESIRED DIV
$htm = file_get_contents($url);
$str = '<div class="post-333 post hentry category-libros" id="post-333">';
$arr = explode($str, $htm);
$new = $arr[1];
$len = strlen($new);

// ACCUMULATE THE OUTPUT STRING HERE
$out = NULL;

// WE ARE INSIDE ONE DIV TAG
$cnt = 1;

// UNTIL THE END OF STRING OR UNTIL WE ARE OUT OF ALL DIV TAGS
while ($len)
{
    // COPY A CHARACTER
    $chr = substr($new,0,1);

    // IF THE DIV NESTING LEVEL INCREASES OR DECREASES
    if (substr($new,0,4) == '<div')  $cnt++;
    if (substr($new,0,5) == '</div') $cnt--;

    // ACTIVATE THIS TO FOLLOW THE COUNT OF NESTING LEVELS
    // echo " $cnt";

    // WHEN THE NESTING LEVEL GOES BACK TO ZERO
    if (!$cnt) break;

    // WHEN THE NESTING LEVEL IS STILL POSITIVE
    $len--;
    $out .= $chr;
    $new = substr($new,1);
} Return $out; }

Open in new window

Ray Paseur

Sorry - I do not keep track of things from one question to the next. Please post the test data that you want us to use, thanks.

Fernanditos

ASKER

Thank you Ray, find the test data here: http://www.frostwave.com/data.html

I want my function to get all content inside DIV:

<div class="post-333 post hentry category-libros" id="post-333">

Open in new window

The value "333 post hentry category-libros" id="post-333"" is dynamic and will always change, so I need to check only the DIV first part, starting with "<div class="post-" and ignore the rest of class name and id name.

Thank you so much for your support.

StingRaY

If you use jQuery, this problem should be easily solved by addressing the following code.

$('div[class^="post"]')

For example...

alert($('div[class^="post"]').html());

Fernanditos

ASKER

I have in mind something like:

$str = '<div class="post-(.*)" id="(.*)">';

Open in new window

ASKER CERTIFIED SOLUTION

StingRaY

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Fernanditos

ASKER

@StingRaY that worked great, I will test once again. Thank you.

$str = '{<div class="post-[^"]+"[^>]+>}';

Ray Paseur

@Fernanditos: I believe that solution can work as long as you are only looking for one <div> per page, and the attributes of the <div> tags are all on one line and in an exact order. Good test data, including edge cases, is fairly important when you're working with external input. Consider what your programming will do with these, which are valid and equivalent HTML statements.

<div class="post-333 post hentry category-libros" id="post-333">
<div id="post-333" class="post-333 post hentry category-libros">
<div class='post-333 post hentry category-libros' id="post-333">
<div
class="post-333
post hentry
category-libros"
id="post-333">

Executive summary: Using regular expressions to parse HTML is not a very professional approach. A state engine is more reliable.

If you are parsing HTML to try to get information from a web publisher you might want to consider asking the publishers if they expose an API. That way you would have a formal interface which is much more dependable than trying to scrape HTML. If the publisher wants you to have their information they will almost certainly want to expose an API that is versioned and dependable.

Anyway, good luck with your project. ~Ray

StingRaY

@Fernanditos: Ray is correct. The solution is not the best one. Other approach would be the better considerable, for example, Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/).