Solved

regular expression, pick up the last occurrance

Posted on 2006-11-21
6
346 Views
Last Modified: 2010-04-16
Hi,

How do I pick up the last occurrance of a matching string by regular expression. For example: I have string:

<a href=url1><a href=url2>......<end>

I want to pick up the last url before "<end>"

I use regular expression pattern: "<a href=(.+?)><end>" and it gives me "url1><a href=url2>......".
0
Comment
Question by:yeshengl
  • 2
  • 2
6 Comments
 
LVL 96

Expert Comment

by:Bob Learned
ID: 17998952
I find that parsing HTML is far easier than using regular expressions.

Here is a previous VB.NET question that highlights what I mean:

  http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21767542.html

I have equivalent C# code.

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 17999850
Bob, could you post that C# code anyway ?
I'd like to have a play.
I can post a separate Q if you like...topic area of your choice :)
Thanks.
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 50 total points
ID: 18169812
Sorry, I lost track of this one :(

using System;
using System.Collections;
using System.Threading;
using System.Runtime.InteropServices;
using System.Windows.Forms;

public class HtmlAnchor
{
  public string HRef = "";
  public string Class = "";
  public string Text = "";
}

public class HtmlImage
{
  public string Src = "";
}

public class HtmlDocument
{
  private ArrayList _anchors = new ArrayList();
  private ArrayList _images = new ArrayList();
 
  [ComImport(), Guid("0000010c-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersist
  {

    void GetClassID(ref Guid pClassId);
  }
  [ComImport(), Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersistStreamInit : IPersist
  {

    new void GetClassID(ref Guid pClassId);

    [PreserveSig()]
    int IsDirty();

    void Load(UCOMIStream pStm);

    void Save(UCOMIStream pStm, [MarshalAs(UnmanagedType.Bool)] bool fClearDirty);

    void GetMaxSize(ref long pCbSize);

    void InitNew();
  }
  private mshtml.HTMLDocument m_document;
  private string m_url = "";

  public HtmlDocument(string url)
  {
    m_url = url;
    Thread thread = new Thread(new ThreadStart(StartGetDocument));
    thread.Start();
    while (m_document == null || m_document.readyState != "complete")
    {
      Application.DoEvents();
    }
    this.FindAnchors(m_document);
  }

  private void StartGetDocument()
  {
    mshtml.HTMLDocument doc = new mshtml.HTMLDocument();
    IPersistStreamInit ips = (IPersistStreamInit)doc;
    ips.InitNew();
    m_document = (mshtml.HTMLDocument)doc.createDocumentFromUrl(m_url, "\0");
  }

  public HtmlDocument(mshtml.HTMLDocument document)
  {
    this.FindAnchors(document);
  }

  private void FindAnchors(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLAnchorElementClass element in document.getElementsByName("a"))
    {
      HtmlAnchor anchor = new HtmlAnchor();
      anchor.HRef = GetAttribute(element, "href");
      anchor.Class = GetAttribute(element, "class");
      anchor.Text = element.innerText;
      _anchors.Add(anchor);
    }
  }

  private void FindImages(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLImgClass element in document.getElementsByName("img"))
    {
      HtmlImage image = new HtmlImage();
      image.Src = GetAttribute(element, "src");
      _images.Add(image);
    }
  }

  private string GetAttribute(mshtml.IHTMLElement element, string attribName)
  {
    if (element.getAttribute(attribName, 0) != null)
    {
      return element.getAttribute(attribName, 0).ToString();
    }
    return "";
  }

  public HtmlAnchor[] Anchors
  {
    get
    {
      return (HtmlAnchor[])(_anchors.ToArray(typeof(HtmlAnchor)));
    }
  }
}

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 18363488
Thanks for that code, btw :)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.
Need to grow your business through quality cloud solutions? With everything required to build a cloud platform and solution, you may feel like the distance between you and the cloud is quite long. Help is here. Spend some time learning about the Con…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now