Solved

regex html div

Posted on 2006-10-23
3
690 Views
Last Modified: 2009-12-16
I have read a html file into one long string.  

I need a regex that gets me everything between <div id="change"> and its matching closing </div>.

There is also the possibility of there being other DIV's inbetween this one.
0
Comment
Question by:cophi
  • 2
3 Comments
 
LVL 6

Expert Comment

by:VovinE
ID: 17790188
This is not possible with regular expressions.
If there were no closing div's inside your div, the regular expression would look something liket this:

"<div id=\"change\">[A-Za-z<>/\\"' \t\n\r]*?</div>"

If you need to match against closing div's also, then regular expression is not the way to get it :)
0
 
LVL 22

Expert Comment

by:_TAD_
ID: 17790435

HTML, if properly coded, is really just a series of XML fields.

Try reading your html file as if it were XML data and then use xPath navigation to find the div tag with the proper attribute.
0
 
LVL 6

Accepted Solution

by:
VovinE earned 500 total points
ID: 17826740
Using regular expressions you might write a parser that does this for you.

Here is sample parser which does what you want:

    public class DivExtractor
    {
        private Regex regex;
        public DivExtractor()
        {
            regex = new Regex("(<div([^>]*)>)|(</div>)|(.)?", RegexOptions.IgnoreCase);
        }

        public string GetDiv(string content)
        {
            string[] r = GetDivs(content);
            if (r.Length > 0)
                return r[0];
            else
                return "";

        }
        public string[] GetDivs(string content)
        {
            extractions = new List<int>();
            level = 0;
            regex.Replace(content, DivMatched);
            List<string> results = new List<string>();
            int i = 0;
            while (i < extractions.Count-1)
            {
                int start = extractions[i++];
                int len = extractions[i++] - start;
                results.Add(content.Substring(start, len));
            }
            return results.ToArray();
        }

        private List<int> extractions;
        private int level;

        private string DivMatched(Match m)
        {
            if (m.Groups[1].Success)
            {
                if (level > 0)
                    level++;
                else if (m.Groups[2].Value.Contains("id=\"change\""))
                {
                    extractions.Add(m.Index + m.Length); // store starting extraction position
                    level++;
                }
            }
            else if (m.Groups[3].Success && level > 0)
            {
                level--;
                if (level == 0)
                {
                    extractions.Add(m.Index); // store closing index position
                }
            }
            return "";
        }
    }


Usage is very simple:

new DivExtractor().GetDiv(html)  // to extract single (first?) div content (without the div tags)

0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question