Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

regular expression, pick up the last occurrance

Posted on 2006-11-21
6
Medium Priority
?
354 Views
Last Modified: 2010-04-16
Hi,

How do I pick up the last occurrance of a matching string by regular expression. For example: I have string:

<a href=url1><a href=url2>......<end>

I want to pick up the last url before "<end>"

I use regular expression pattern: "<a href=(.+?)><end>" and it gives me "url1><a href=url2>......".
0
Comment
Question by:yeshengl
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
6 Comments
 
LVL 96

Expert Comment

by:Bob Learned
ID: 17998952
I find that parsing HTML is far easier than using regular expressions.

Here is a previous VB.NET question that highlights what I mean:

  http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21767542.html

I have equivalent C# code.

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 17999850
Bob, could you post that C# code anyway ?
I'd like to have a play.
I can post a separate Q if you like...topic area of your choice :)
Thanks.
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 200 total points
ID: 18169812
Sorry, I lost track of this one :(

using System;
using System.Collections;
using System.Threading;
using System.Runtime.InteropServices;
using System.Windows.Forms;

public class HtmlAnchor
{
  public string HRef = "";
  public string Class = "";
  public string Text = "";
}

public class HtmlImage
{
  public string Src = "";
}

public class HtmlDocument
{
  private ArrayList _anchors = new ArrayList();
  private ArrayList _images = new ArrayList();
 
  [ComImport(), Guid("0000010c-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersist
  {

    void GetClassID(ref Guid pClassId);
  }
  [ComImport(), Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersistStreamInit : IPersist
  {

    new void GetClassID(ref Guid pClassId);

    [PreserveSig()]
    int IsDirty();

    void Load(UCOMIStream pStm);

    void Save(UCOMIStream pStm, [MarshalAs(UnmanagedType.Bool)] bool fClearDirty);

    void GetMaxSize(ref long pCbSize);

    void InitNew();
  }
  private mshtml.HTMLDocument m_document;
  private string m_url = "";

  public HtmlDocument(string url)
  {
    m_url = url;
    Thread thread = new Thread(new ThreadStart(StartGetDocument));
    thread.Start();
    while (m_document == null || m_document.readyState != "complete")
    {
      Application.DoEvents();
    }
    this.FindAnchors(m_document);
  }

  private void StartGetDocument()
  {
    mshtml.HTMLDocument doc = new mshtml.HTMLDocument();
    IPersistStreamInit ips = (IPersistStreamInit)doc;
    ips.InitNew();
    m_document = (mshtml.HTMLDocument)doc.createDocumentFromUrl(m_url, "\0");
  }

  public HtmlDocument(mshtml.HTMLDocument document)
  {
    this.FindAnchors(document);
  }

  private void FindAnchors(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLAnchorElementClass element in document.getElementsByName("a"))
    {
      HtmlAnchor anchor = new HtmlAnchor();
      anchor.HRef = GetAttribute(element, "href");
      anchor.Class = GetAttribute(element, "class");
      anchor.Text = element.innerText;
      _anchors.Add(anchor);
    }
  }

  private void FindImages(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLImgClass element in document.getElementsByName("img"))
    {
      HtmlImage image = new HtmlImage();
      image.Src = GetAttribute(element, "src");
      _images.Add(image);
    }
  }

  private string GetAttribute(mshtml.IHTMLElement element, string attribName)
  {
    if (element.getAttribute(attribName, 0) != null)
    {
      return element.getAttribute(attribName, 0).ToString();
    }
    return "";
  }

  public HtmlAnchor[] Anchors
  {
    get
    {
      return (HtmlAnchor[])(_anchors.ToArray(typeof(HtmlAnchor)));
    }
  }
}

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 18363488
Thanks for that code, btw :)
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question