regex ancor a text between divs

I tried this code to remove a div section from a code itsa not working still
$banner = '#' // Opening delimiter
	.'<div'   // opening of div
	.'(.*)' // any character within the div tag
	.'id="banner"' // with specific ID of banner
	.'(.*)' // any character after id of the div and the closing div
	.'</div>' // the closing div tag
	.'#'; // closing delmiter
	 $buffer = preg_replace($banner,'', $buffer);

Open in new window

Nura111Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

DerokorianCommented:
Can you say how its not working? Is it not removing anything? removing too much? throwing an error?
0
Nura111Author Commented:
not removing anything dont we need to add spmrthing for the spaces between the div and id?
0
Nura111Author Commented:
??
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

DerokorianCommented:
I can't get it to work either. I made myself this little script to try, but I can't figure it out:

<?php

$buffer = <<<EOD

<html>
	<head>
		<title>Example</title>
	</head>
	<body>
		<div id="container">
			<div id="header">
				<h1>Example Page</h1><br />
				<div id="banner">
					<a href="www.domain.com/some/link.html"><h2>Banner Image</h2></a>
				</div>
			</div>
			<div id="nav">
				<a href="/page1.html">Link 1</a>
				<a href="/page2.html">Link 2</a>
				<a href="/page3.html">Link 3</a>
				<a href="/page4.html">Link 4</a>
			</div>
			<div id="content">
				<h1>Welcome to my example</h1>
				<p>Content WEEEEE content content content</p>
			</div>
			<div id="footer">
				&copy; 2011 examples
			</div>
		</div>
	</body>
</html>

EOD;

$buffer = preg_replace('#<div id="banner">(.*)</div>#','', $buffer);

echo $buffer;

Open in new window


Even with the overly simplified regex which should look for that specific div tag and the first closing and anything in between its not working. i will keep trying because I am bored today but maybe someone can swoop in with the save.
0
Nura111Author Commented:
ok Im looking into it as well let me know if you have somehitng
0
Nura111Author Commented:
I can tell you that ([^>]+)> is not working

<div([^>]+)id="banner" is working
0
Nura111Author Commented:
/<div([^>]+)id="banner">/ is working and than everything between.. id not seem to work either with .* or (.*)
0
Nura111Author Commented:
$buffer = preg_replace('/<div([^>]+)id="banner">(.*)<\/div>/s','test', $buffer); is just rempobing everything  until the final </div> in the page its doesn't find the first </div> and stop from some reason
0
käµfm³d 👽Commented:
In terms of find/replace, ".*" is very dangerous. The reason is that it is a greedy operation, and it will try to consume as much as possible before deciding it found a match. As an example, let's say you had this text:

<div><div id="banner"><span>My Title</span></div></div>

Open in new window


Now let's say you ran your code. You would end up with a result of nothing; whereas you might have intended to leave the outer <div>'s. When using dot-star, you typically end up wanting the non-greedy version:  .*?

The question mark makes the dot-star non-greedy.

For your particular situation, I would suggest this pattern:

$banner = '#<div[^>]*?id="banner"[^>]*>(?:[^<]|<(?!/div>))*</div>#';

Open in new window


...which means:
$banner = '#'
         .'<div'            // Opening delimiter
         .'[^>]*?'          // Any number characters which are not a closing bracket - non-greedy
         .'id="banner"'     // With specific id of banner
         .'[^>]*>'          // Any number of characters which are not a closing bracket, followed by a closing bracket
         .'(?:'             // Start of non-capturing group
         .'[^<]|<(?!/div>)' // Either a character that is not an opening bracket OR an opening bracket that is NOT followed by the string "/div"
         .')'               // End of non-capturing group
         .'*'               // Zero or more of the previous -- applies to the entire non-capturing group
         .'</div>'          // Closing div tag
         .'#'               // Closing delimiter

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
käµfm³d 👽Commented:
I put "opening delimiter's description on the wrong line. The first two lines have the same description as your original pattern.
0
Nura111Author Commented:
ok find it:

$buffer = preg_replace('/<div([^>]+)id="banner">(.*?)<\/div>/s','test', $buffer); is working
not really sure how to explain it what is the diffrence between (.*?) and (.*)
0
käµfm³d 👽Commented:
not really sure how to explain it what is the diffrence between (.*?) and (.*)
Here's another example. Take the string "HELLO WORLD!". Using the pattern ".*L" you get:

HELLO WORL

Open in new window


using the pattern ".*?l" you get:

HEL

Open in new window

0
Nura111Author Commented:
oh sorry didnt see the comment. is my might not work in other cases ?  its work good for me now.

also what about when I wantd to remove a nested tag div for example:
<div id="news-container">  content </div>

can I do it with a regex how can I catch the final div


   <div id="news-container">

                       <div id="news-cap"><a href="/rssfeed.xml"><img src="images/rss.gif" alt="Daly City News" title="Daly City News" style="margin-top: 18px; margin-left: 80px" border="0" /></a></div>
                    <div id="news-content"><div id="news-container"><div class="news-title"></div><div class="news-content">Coming Soon</div></div></div>
                </div>
0
DerokorianCommented:
<3 kaufmed
0
Nura111Author Commented:
what is that mean?
0
käµfm³d 👽Commented:
can I do it with a regex how can I catch the final div
You might be able to, provided your HTML is structured well (as in XML-well-structured). Generally, it's not advisable to try and parse HTML using regex because regex cannot deal with things like unbalanced tags.

For your last example, if you wanted to remove the last <div>, and you knew that it was id:news-content with no nested tags, then I believe you could do:


preg_replace('#'                    // Opening delimiter
            .'('.                   // Start of capturing group -- capture group 1
            .'<div'                 // Literal text
            .' +'                   // One or more spaces
            .'id="news-container">' // Literal text
            .'(?:'                  // Start of non-capturing group
            .'[^<]'                 // Any character NOT an opening bracket
            .'|'                    // OR
            .'<'                    // An opening bracket...
            .'(?!'                  // ...NOT followed by...
            .'div'                  // ...literal text...
            .' +'                   // ...one or more spaces...
            .'class="news-content">'// ...literal text
            .')'                    // End of NOT followed by
            .')'                    // End of capture group 1
            .')'                    // End of non-capturing group
            .'*'                    // Zero or more of thing to the right; in this case, entire non-capturing group
            .'<div'                 // Literal text
            .' +'                   // One or more spaces
            .'class="news-content">'// Literal text
            .'[^<]*'                // Zero or more of any character NOT an opening bracket
            .'</div>'               // Literal text
            .'#',                   // Closing delimiter
            '$1',                   // Replace with stuff stored in capture group 1
            $input);

Open in new window

0
Nura111Author Commented:
ok thanks I actully did it at the end by using

$buffer = preg_replace('/<div([^>]+)id="news-container">(.*?)<\/div><\/div><\/div>(.*?)<\/div>/s','test2', $buffer);


but there Isnt any way to do it in general to nested tag using regex right?
0
Ray PaseurCommented:
Have a look at this article and see if you can show us a representative collection of test data that you want this regular expression to operate on.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

After you have read that, you may want to take heed of kaufmed's wise advice, "... it's not advisable to try and parse HTML using regex..."  The thing you want is probably a "state engine."  It uses the opening tags to set it's "state" and hopes for closing tags to reset it's state.  Depending on the state, it behaves differently as it examines each letter of the string.
0
Nura111Author Commented:
ok . what do you mean by
"The thing you want is probably a "state engine."  It uses the opening tags to set it's "state" and hopes for closing tags to reset it's state.  Depending on the state, it behaves differently as it examines each letter of the string."


In the article its also regex
0
käµfm³d 👽Commented:
Interpret "state machine" as context-free grammar.
0
Ray PaseurCommented:
Yes, Nura111, In the article it's also regex.  The article (as stated in the article) "is not really about regular expressions -- it is about how to use TDD to your advantage."  TDD in the simplest sense is the creation of a test data set and a test methodology before writing complicated and potentially brittle code.  That's why we want to see your test data.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.