Solved

regex html div

Posted on 2006-10-23
3
695 Views
Last Modified: 2009-12-16
I have read a html file into one long string.  

I need a regex that gets me everything between <div id="change"> and its matching closing </div>.

There is also the possibility of there being other DIV's inbetween this one.
0
Comment
Question by:cophi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 6

Expert Comment

by:VovinE
ID: 17790188
This is not possible with regular expressions.
If there were no closing div's inside your div, the regular expression would look something liket this:

"<div id=\"change\">[A-Za-z<>/\\"' \t\n\r]*?</div>"

If you need to match against closing div's also, then regular expression is not the way to get it :)
0
 
LVL 22

Expert Comment

by:_TAD_
ID: 17790435

HTML, if properly coded, is really just a series of XML fields.

Try reading your html file as if it were XML data and then use xPath navigation to find the div tag with the proper attribute.
0
 
LVL 6

Accepted Solution

by:
VovinE earned 500 total points
ID: 17826740
Using regular expressions you might write a parser that does this for you.

Here is sample parser which does what you want:

    public class DivExtractor
    {
        private Regex regex;
        public DivExtractor()
        {
            regex = new Regex("(<div([^>]*)>)|(</div>)|(.)?", RegexOptions.IgnoreCase);
        }

        public string GetDiv(string content)
        {
            string[] r = GetDivs(content);
            if (r.Length > 0)
                return r[0];
            else
                return "";

        }
        public string[] GetDivs(string content)
        {
            extractions = new List<int>();
            level = 0;
            regex.Replace(content, DivMatched);
            List<string> results = new List<string>();
            int i = 0;
            while (i < extractions.Count-1)
            {
                int start = extractions[i++];
                int len = extractions[i++] - start;
                results.Add(content.Substring(start, len));
            }
            return results.ToArray();
        }

        private List<int> extractions;
        private int level;

        private string DivMatched(Match m)
        {
            if (m.Groups[1].Success)
            {
                if (level > 0)
                    level++;
                else if (m.Groups[2].Value.Contains("id=\"change\""))
                {
                    extractions.Add(m.Index + m.Length); // store starting extraction position
                    level++;
                }
            }
            else if (m.Groups[3].Success && level > 0)
            {
                level--;
                if (level == 0)
                {
                    extractions.Add(m.Index); // store closing index position
                }
            }
            return "";
        }
    }


Usage is very simple:

new DivExtractor().GetDiv(html)  // to extract single (first?) div content (without the div tags)

0

Featured Post

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

687 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question