Solved

regex html div

Posted on 2006-10-23
3
682 Views
Last Modified: 2009-12-16
I have read a html file into one long string.  

I need a regex that gets me everything between <div id="change"> and its matching closing </div>.

There is also the possibility of there being other DIV's inbetween this one.
0
Comment
Question by:cophi
  • 2
3 Comments
 
LVL 6

Expert Comment

by:VovinE
ID: 17790188
This is not possible with regular expressions.
If there were no closing div's inside your div, the regular expression would look something liket this:

"<div id=\"change\">[A-Za-z<>/\\"' \t\n\r]*?</div>"

If you need to match against closing div's also, then regular expression is not the way to get it :)
0
 
LVL 22

Expert Comment

by:_TAD_
ID: 17790435

HTML, if properly coded, is really just a series of XML fields.

Try reading your html file as if it were XML data and then use xPath navigation to find the div tag with the proper attribute.
0
 
LVL 6

Accepted Solution

by:
VovinE earned 500 total points
ID: 17826740
Using regular expressions you might write a parser that does this for you.

Here is sample parser which does what you want:

    public class DivExtractor
    {
        private Regex regex;
        public DivExtractor()
        {
            regex = new Regex("(<div([^>]*)>)|(</div>)|(.)?", RegexOptions.IgnoreCase);
        }

        public string GetDiv(string content)
        {
            string[] r = GetDivs(content);
            if (r.Length > 0)
                return r[0];
            else
                return "";

        }
        public string[] GetDivs(string content)
        {
            extractions = new List<int>();
            level = 0;
            regex.Replace(content, DivMatched);
            List<string> results = new List<string>();
            int i = 0;
            while (i < extractions.Count-1)
            {
                int start = extractions[i++];
                int len = extractions[i++] - start;
                results.Add(content.Substring(start, len));
            }
            return results.ToArray();
        }

        private List<int> extractions;
        private int level;

        private string DivMatched(Match m)
        {
            if (m.Groups[1].Success)
            {
                if (level > 0)
                    level++;
                else if (m.Groups[2].Value.Contains("id=\"change\""))
                {
                    extractions.Add(m.Index + m.Length); // store starting extraction position
                    level++;
                }
            }
            else if (m.Groups[3].Success && level > 0)
            {
                level--;
                if (level == 0)
                {
                    extractions.Add(m.Index); // store closing index position
                }
            }
            return "";
        }
    }


Usage is very simple:

new DivExtractor().GetDiv(html)  // to extract single (first?) div content (without the div tags)

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Image(7) 1 53
C# Error - Add Failed 12 78
Split string on commas but not when enclosed in parentheses 7 48
Runtime Exceptions when trying to submit data 28 37
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
As a trusted technology advisor to your customers you are likely getting the daily question of, ‘should I put this in the cloud?’ As customer demands for cloud services increases, companies will see a shift from traditional buying patterns to new…

896 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now