Getting content inside DIV with dynamic class name and ID.

The attached function gets the content inside the defined DIV. This function works just perfect.

The defined DIV here is:

$str = '<div class="post-333 post hentry category-libros" id="post-333">';

When I define the DIV, I want to ignore "333 post hentry category-libros" id="post-333" part. So, I want to get the content inside the DIV starting like "<div class="post-"  ... ignoring anything after "post-"

I mean to get the content inside the div with name starting with class name "post-" and ignoring the rest of name and id.

Please help.
Thank you
function get_content ($url) {
// FIND ALL OF THE DESIRED DIV
$htm = file_get_contents($url);
$str = '<div class="post-333 post hentry category-libros" id="post-333">';
$arr = explode($str, $htm);
$new = $arr[1];
$len = strlen($new);

// ACCUMULATE THE OUTPUT STRING HERE
$out = NULL;

// WE ARE INSIDE ONE DIV TAG
$cnt = 1;

// UNTIL THE END OF STRING OR UNTIL WE ARE OUT OF ALL DIV TAGS
while ($len)
{
    // COPY A CHARACTER
    $chr = substr($new,0,1);

    // IF THE DIV NESTING LEVEL INCREASES OR DECREASES
    if (substr($new,0,4) == '<div')  $cnt++;
    if (substr($new,0,5) == '</div') $cnt--;

    // ACTIVATE THIS TO FOLLOW THE COUNT OF NESTING LEVELS
    // echo " $cnt";

    // WHEN THE NESTING LEVEL GOES BACK TO ZERO
    if (!$cnt) break;

    // WHEN THE NESTING LEVEL IS STILL POSITIVE
    $len--;
    $out .= $chr;
    $new = substr($new,1);
} Return $out; }

Open in new window

FernanditosAsked:
Who is Participating?
 
StingRaYCommented:
Ah! sorry I get you wrong.

You can use preg_split instead of explode.

function get_content ($url) {
// FIND ALL OF THE DESIRED DIV
$htm = file_get_contents($url);

$str = '{<div class="post-[^"]+"[^>]+>}';
$arr = preg_split($str, $htm);
$new = $arr[1];
$len = strlen($new);

// ACCUMULATE THE OUTPUT STRING HERE
$out = NULL;

// WE ARE INSIDE ONE DIV TAG
$cnt = 1;

// UNTIL THE END OF STRING OR UNTIL WE ARE OUT OF ALL DIV TAGS
while ($len)
{
    // COPY A CHARACTER
    $chr = substr($new,0,1);

    // IF THE DIV NESTING LEVEL INCREASES OR DECREASES
    if (substr($new,0,4) == '<div')  $cnt++;
    if (substr($new,0,5) == '</div') $cnt--;

    // ACTIVATE THIS TO FOLLOW THE COUNT OF NESTING LEVELS
    // echo " $cnt";

    // WHEN THE NESTING LEVEL GOES BACK TO ZERO
    if (!$cnt) break;

    // WHEN THE NESTING LEVEL IS STILL POSITIVE
    $len--;
    $out .= $chr;
    $new = substr($new,1);
} Return $out; }

Open in new window

0
 
Ray PaseurCommented:
Sorry - I do not keep track of things from one question to the next.  Please post the test data that you want us to use, thanks.
0
 
FernanditosAuthor Commented:
Thank you Ray, find the test data here: http://www.frostwave.com/data.html

I want my function to get all content inside DIV:
 
<div class="post-333 post hentry category-libros" id="post-333">

Open in new window


The value "333 post hentry category-libros" id="post-333"" is dynamic and will always change, so I need to check only the DIV first part, starting with "<div class="post-" and ignore the rest of class name and id name.

Thank you so much for your support.
0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

 
StingRaYCommented:
If you use jQuery, this problem should be easily solved by addressing the following code.

$('div[class^="post"]')

For example...

alert($('div[class^="post"]').html());
0
 
FernanditosAuthor Commented:
I have in mind something like:

$str = '<div class="post-(.*)" id="(.*)">';

Open in new window


0
 
FernanditosAuthor Commented:
@StingRaY that worked great, I will test once again. Thank you.

$str = '{<div class="post-[^"]+"[^>]+>}';
0
 
Ray PaseurCommented:
@Fernanditos: I believe that solution can work as long as you are only looking for one <div> per page, and the attributes of the <div> tags are all on one line and in an exact order.  Good test data, including edge cases, is fairly important when you're working with external input.  Consider what your programming will do with these, which are valid and equivalent HTML statements.

<div class="post-333 post hentry category-libros" id="post-333">
<div id="post-333" class="post-333 post hentry category-libros">
<div class='post-333 post hentry category-libros' id="post-333">
<div
    class="post-333
               post hentry
               category-libros"
    id="post-333">

Executive summary: Using regular expressions to parse HTML is not a very professional approach.  A state engine is more reliable.

If you are parsing HTML to try to get information from a web publisher you might want to consider asking the publishers if they expose an API.  That way you would have a formal interface which is much more dependable than trying to scrape HTML.  If the publisher wants you to have their information they will almost certainly want to expose an API that is versioned and dependable.

Anyway, good luck with your project. ~Ray
0
 
StingRaYCommented:
@Fernanditos: Ray is correct. The solution is not the best one. Other approach would be the better considerable, for example, Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/).
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.