Fernanditos
asked on
Getting content inside DIV with dynamic class name and ID.
The attached function gets the content inside the defined DIV. This function works just perfect.
The defined DIV here is:
$str = '<div class="post-333 post hentry category-libros" id="post-333">';
When I define the DIV, I want to ignore "333 post hentry category-libros" id="post-333" part. So, I want to get the content inside the DIV starting like "<div class="post-" ... ignoring anything after "post-"
I mean to get the content inside the div with name starting with class name "post-" and ignoring the rest of name and id.
Please help.
Thank you
The defined DIV here is:
$str = '<div class="post-333 post hentry category-libros" id="post-333">';
When I define the DIV, I want to ignore "333 post hentry category-libros" id="post-333" part. So, I want to get the content inside the DIV starting like "<div class="post-" ... ignoring anything after "post-"
I mean to get the content inside the div with name starting with class name "post-" and ignoring the rest of name and id.
Please help.
Thank you
function get_content ($url) {
// FIND ALL OF THE DESIRED DIV
$htm = file_get_contents($url);
$str = '<div class="post-333 post hentry category-libros" id="post-333">';
$arr = explode($str, $htm);
$new = $arr[1];
$len = strlen($new);
// ACCUMULATE THE OUTPUT STRING HERE
$out = NULL;
// WE ARE INSIDE ONE DIV TAG
$cnt = 1;
// UNTIL THE END OF STRING OR UNTIL WE ARE OUT OF ALL DIV TAGS
while ($len)
{
// COPY A CHARACTER
$chr = substr($new,0,1);
// IF THE DIV NESTING LEVEL INCREASES OR DECREASES
if (substr($new,0,4) == '<div') $cnt++;
if (substr($new,0,5) == '</div') $cnt--;
// ACTIVATE THIS TO FOLLOW THE COUNT OF NESTING LEVELS
// echo " $cnt";
// WHEN THE NESTING LEVEL GOES BACK TO ZERO
if (!$cnt) break;
// WHEN THE NESTING LEVEL IS STILL POSITIVE
$len--;
$out .= $chr;
$new = substr($new,1);
} Return $out; }
Sorry - I do not keep track of things from one question to the next. Please post the test data that you want us to use, thanks.
ASKER
Thank you Ray, find the test data here: http://www.frostwave.com/data.html
I want my function to get all content inside DIV:
The value "333 post hentry category-libros" id="post-333"" is dynamic and will always change, so I need to check only the DIV first part, starting with "<div class="post-" and ignore the rest of class name and id name.
Thank you so much for your support.
I want my function to get all content inside DIV:
<div class="post-333 post hentry category-libros" id="post-333">
The value "333 post hentry category-libros" id="post-333"" is dynamic and will always change, so I need to check only the DIV first part, starting with "<div class="post-" and ignore the rest of class name and id name.
Thank you so much for your support.
If you use jQuery, this problem should be easily solved by addressing the following code.
$('div[class^="post"]')
For example...
alert($('div[class^="post" ]').html() );
$('div[class^="post"]')
For example...
alert($('div[class^="post"
ASKER
I have in mind something like:
$str = '<div class="post-(.*)" id="(.*)">';
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
@StingRaY that worked great, I will test once again. Thank you.
$str = '{<div class="post-[^"]+"[^>]+>}' ;
$str = '{<div class="post-[^"]+"[^>]+>}'
@Fernanditos: I believe that solution can work as long as you are only looking for one <div> per page, and the attributes of the <div> tags are all on one line and in an exact order. Good test data, including edge cases, is fairly important when you're working with external input. Consider what your programming will do with these, which are valid and equivalent HTML statements.
<div class="post-333 post hentry category-libros" id="post-333">
<div id="post-333" class="post-333 post hentry category-libros">
<div class='post-333 post hentry category-libros' id="post-333">
<div
class="post-333
post hentry
category-libros"
id="post-333">
Executive summary: Using regular expressions to parse HTML is not a very professional approach. A state engine is more reliable.
If you are parsing HTML to try to get information from a web publisher you might want to consider asking the publishers if they expose an API. That way you would have a formal interface which is much more dependable than trying to scrape HTML. If the publisher wants you to have their information they will almost certainly want to expose an API that is versioned and dependable.
Anyway, good luck with your project. ~Ray
<div class="post-333 post hentry category-libros" id="post-333">
<div id="post-333" class="post-333 post hentry category-libros">
<div class='post-333 post hentry category-libros' id="post-333">
<div
class="post-333
post hentry
category-libros"
id="post-333">
Executive summary: Using regular expressions to parse HTML is not a very professional approach. A state engine is more reliable.
If you are parsing HTML to try to get information from a web publisher you might want to consider asking the publishers if they expose an API. That way you would have a formal interface which is much more dependable than trying to scrape HTML. If the publisher wants you to have their information they will almost certainly want to expose an API that is versioned and dependable.
Anyway, good luck with your project. ~Ray
@Fernanditos: Ray is correct. The solution is not the best one. Other approach would be the better considerable, for example, Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/).