Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 587
  • Last Modified:

regex ancor a text between divs

I tried this code to remove a div section from a code itsa not working still
$banner = '#' // Opening delimiter
	.'<div'   // opening of div
	.'(.*)' // any character within the div tag
	.'id="banner"' // with specific ID of banner
	.'(.*)' // any character after id of the div and the closing div
	.'</div>' // the closing div tag
	.'#'; // closing delmiter
	 $buffer = preg_replace($banner,'', $buffer);

Open in new window

0
Nura111
Asked:
Nura111
  • 11
  • 5
  • 3
  • +1
1 Solution
 
DerokorianCommented:
Can you say how its not working? Is it not removing anything? removing too much? throwing an error?
0
 
Nura111Author Commented:
not removing anything dont we need to add spmrthing for the spaces between the div and id?
0
 
Nura111Author Commented:
??
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
DerokorianCommented:
I can't get it to work either. I made myself this little script to try, but I can't figure it out:

<?php

$buffer = <<<EOD

<html>
	<head>
		<title>Example</title>
	</head>
	<body>
		<div id="container">
			<div id="header">
				<h1>Example Page</h1><br />
				<div id="banner">
					<a href="www.domain.com/some/link.html"><h2>Banner Image</h2></a>
				</div>
			</div>
			<div id="nav">
				<a href="/page1.html">Link 1</a>
				<a href="/page2.html">Link 2</a>
				<a href="/page3.html">Link 3</a>
				<a href="/page4.html">Link 4</a>
			</div>
			<div id="content">
				<h1>Welcome to my example</h1>
				<p>Content WEEEEE content content content</p>
			</div>
			<div id="footer">
				&copy; 2011 examples
			</div>
		</div>
	</body>
</html>

EOD;

$buffer = preg_replace('#<div id="banner">(.*)</div>#','', $buffer);

echo $buffer;

Open in new window


Even with the overly simplified regex which should look for that specific div tag and the first closing and anything in between its not working. i will keep trying because I am bored today but maybe someone can swoop in with the save.
0
 
Nura111Author Commented:
ok Im looking into it as well let me know if you have somehitng
0
 
Nura111Author Commented:
I can tell you that ([^>]+)> is not working

<div([^>]+)id="banner" is working
0
 
Nura111Author Commented:
/<div([^>]+)id="banner">/ is working and than everything between.. id not seem to work either with .* or (.*)
0
 
Nura111Author Commented:
$buffer = preg_replace('/<div([^>]+)id="banner">(.*)<\/div>/s','test', $buffer); is just rempobing everything  until the final </div> in the page its doesn't find the first </div> and stop from some reason
0
 
käµfm³d 👽Commented:
In terms of find/replace, ".*" is very dangerous. The reason is that it is a greedy operation, and it will try to consume as much as possible before deciding it found a match. As an example, let's say you had this text:

<div><div id="banner"><span>My Title</span></div></div>

Open in new window


Now let's say you ran your code. You would end up with a result of nothing; whereas you might have intended to leave the outer <div>'s. When using dot-star, you typically end up wanting the non-greedy version:  .*?

The question mark makes the dot-star non-greedy.

For your particular situation, I would suggest this pattern:

$banner = '#<div[^>]*?id="banner"[^>]*>(?:[^<]|<(?!/div>))*</div>#';

Open in new window


...which means:
$banner = '#'
         .'<div'            // Opening delimiter
         .'[^>]*?'          // Any number characters which are not a closing bracket - non-greedy
         .'id="banner"'     // With specific id of banner
         .'[^>]*>'          // Any number of characters which are not a closing bracket, followed by a closing bracket
         .'(?:'             // Start of non-capturing group
         .'[^<]|<(?!/div>)' // Either a character that is not an opening bracket OR an opening bracket that is NOT followed by the string "/div"
         .')'               // End of non-capturing group
         .'*'               // Zero or more of the previous -- applies to the entire non-capturing group
         .'</div>'          // Closing div tag
         .'#'               // Closing delimiter

Open in new window

0
 
käµfm³d 👽Commented:
I put "opening delimiter's description on the wrong line. The first two lines have the same description as your original pattern.
0
 
Nura111Author Commented:
ok find it:

$buffer = preg_replace('/<div([^>]+)id="banner">(.*?)<\/div>/s','test', $buffer); is working
not really sure how to explain it what is the diffrence between (.*?) and (.*)
0
 
käµfm³d 👽Commented:
not really sure how to explain it what is the diffrence between (.*?) and (.*)
Here's another example. Take the string "HELLO WORLD!". Using the pattern ".*L" you get:

HELLO WORL

Open in new window


using the pattern ".*?l" you get:

HEL

Open in new window

0
 
Nura111Author Commented:
oh sorry didnt see the comment. is my might not work in other cases ?  its work good for me now.

also what about when I wantd to remove a nested tag div for example:
<div id="news-container">  content </div>

can I do it with a regex how can I catch the final div


   <div id="news-container">

                       <div id="news-cap"><a href="/rssfeed.xml"><img src="images/rss.gif" alt="Daly City News" title="Daly City News" style="margin-top: 18px; margin-left: 80px" border="0" /></a></div>
                    <div id="news-content"><div id="news-container"><div class="news-title"></div><div class="news-content">Coming Soon</div></div></div>
                </div>
0
 
DerokorianCommented:
<3 kaufmed
0
 
Nura111Author Commented:
what is that mean?
0
 
käµfm³d 👽Commented:
can I do it with a regex how can I catch the final div
You might be able to, provided your HTML is structured well (as in XML-well-structured). Generally, it's not advisable to try and parse HTML using regex because regex cannot deal with things like unbalanced tags.

For your last example, if you wanted to remove the last <div>, and you knew that it was id:news-content with no nested tags, then I believe you could do:


preg_replace('#'                    // Opening delimiter
            .'('.                   // Start of capturing group -- capture group 1
            .'<div'                 // Literal text
            .' +'                   // One or more spaces
            .'id="news-container">' // Literal text
            .'(?:'                  // Start of non-capturing group
            .'[^<]'                 // Any character NOT an opening bracket
            .'|'                    // OR
            .'<'                    // An opening bracket...
            .'(?!'                  // ...NOT followed by...
            .'div'                  // ...literal text...
            .' +'                   // ...one or more spaces...
            .'class="news-content">'// ...literal text
            .')'                    // End of NOT followed by
            .')'                    // End of capture group 1
            .')'                    // End of non-capturing group
            .'*'                    // Zero or more of thing to the right; in this case, entire non-capturing group
            .'<div'                 // Literal text
            .' +'                   // One or more spaces
            .'class="news-content">'// Literal text
            .'[^<]*'                // Zero or more of any character NOT an opening bracket
            .'</div>'               // Literal text
            .'#',                   // Closing delimiter
            '$1',                   // Replace with stuff stored in capture group 1
            $input);

Open in new window

0
 
Nura111Author Commented:
ok thanks I actully did it at the end by using

$buffer = preg_replace('/<div([^>]+)id="news-container">(.*?)<\/div><\/div><\/div>(.*?)<\/div>/s','test2', $buffer);


but there Isnt any way to do it in general to nested tag using regex right?
0
 
Ray PaseurCommented:
Have a look at this article and see if you can show us a representative collection of test data that you want this regular expression to operate on.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

After you have read that, you may want to take heed of kaufmed's wise advice, "... it's not advisable to try and parse HTML using regex..."  The thing you want is probably a "state engine."  It uses the opening tags to set it's "state" and hopes for closing tags to reset it's state.  Depending on the state, it behaves differently as it examines each letter of the string.
0
 
Nura111Author Commented:
ok . what do you mean by
"The thing you want is probably a "state engine."  It uses the opening tags to set it's "state" and hopes for closing tags to reset it's state.  Depending on the state, it behaves differently as it examines each letter of the string."


In the article its also regex
0
 
käµfm³d 👽Commented:
Interpret "state machine" as context-free grammar.
0
 
Ray PaseurCommented:
Yes, Nura111, In the article it's also regex.  The article (as stated in the article) "is not really about regular expressions -- it is about how to use TDD to your advantage."  TDD in the simplest sense is the creation of a test data set and a test methodology before writing complicated and potentially brittle code.  That's why we want to see your test data.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 11
  • 5
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now