Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

regular expression, pick up the last occurrance

Posted on 2006-11-21
6
Medium Priority
?
356 Views
Last Modified: 2010-04-16
Hi,

How do I pick up the last occurrance of a matching string by regular expression. For example: I have string:

<a href=url1><a href=url2>......<end>

I want to pick up the last url before "<end>"

I use regular expression pattern: "<a href=(.+?)><end>" and it gives me "url1><a href=url2>......".
0
Comment
Question by:yeshengl
  • 2
  • 2
6 Comments
 
LVL 96

Expert Comment

by:Bob Learned
ID: 17998952
I find that parsing HTML is far easier than using regular expressions.

Here is a previous VB.NET question that highlights what I mean:

  http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21767542.html

I have equivalent C# code.

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 17999850
Bob, could you post that C# code anyway ?
I'd like to have a play.
I can post a separate Q if you like...topic area of your choice :)
Thanks.
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 200 total points
ID: 18169812
Sorry, I lost track of this one :(

using System;
using System.Collections;
using System.Threading;
using System.Runtime.InteropServices;
using System.Windows.Forms;

public class HtmlAnchor
{
  public string HRef = "";
  public string Class = "";
  public string Text = "";
}

public class HtmlImage
{
  public string Src = "";
}

public class HtmlDocument
{
  private ArrayList _anchors = new ArrayList();
  private ArrayList _images = new ArrayList();
 
  [ComImport(), Guid("0000010c-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersist
  {

    void GetClassID(ref Guid pClassId);
  }
  [ComImport(), Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersistStreamInit : IPersist
  {

    new void GetClassID(ref Guid pClassId);

    [PreserveSig()]
    int IsDirty();

    void Load(UCOMIStream pStm);

    void Save(UCOMIStream pStm, [MarshalAs(UnmanagedType.Bool)] bool fClearDirty);

    void GetMaxSize(ref long pCbSize);

    void InitNew();
  }
  private mshtml.HTMLDocument m_document;
  private string m_url = "";

  public HtmlDocument(string url)
  {
    m_url = url;
    Thread thread = new Thread(new ThreadStart(StartGetDocument));
    thread.Start();
    while (m_document == null || m_document.readyState != "complete")
    {
      Application.DoEvents();
    }
    this.FindAnchors(m_document);
  }

  private void StartGetDocument()
  {
    mshtml.HTMLDocument doc = new mshtml.HTMLDocument();
    IPersistStreamInit ips = (IPersistStreamInit)doc;
    ips.InitNew();
    m_document = (mshtml.HTMLDocument)doc.createDocumentFromUrl(m_url, "\0");
  }

  public HtmlDocument(mshtml.HTMLDocument document)
  {
    this.FindAnchors(document);
  }

  private void FindAnchors(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLAnchorElementClass element in document.getElementsByName("a"))
    {
      HtmlAnchor anchor = new HtmlAnchor();
      anchor.HRef = GetAttribute(element, "href");
      anchor.Class = GetAttribute(element, "class");
      anchor.Text = element.innerText;
      _anchors.Add(anchor);
    }
  }

  private void FindImages(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLImgClass element in document.getElementsByName("img"))
    {
      HtmlImage image = new HtmlImage();
      image.Src = GetAttribute(element, "src");
      _images.Add(image);
    }
  }

  private string GetAttribute(mshtml.IHTMLElement element, string attribName)
  {
    if (element.getAttribute(attribName, 0) != null)
    {
      return element.getAttribute(attribName, 0).ToString();
    }
    return "";
  }

  public HtmlAnchor[] Anchors
  {
    get
    {
      return (HtmlAnchor[])(_anchors.ToArray(typeof(HtmlAnchor)));
    }
  }
}

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 18363488
Thanks for that code, btw :)
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
High user turnover can cause old/redundant user data to consume valuable space. UserResourceCleanup was developed to address this by automatically deleting user folders when the user account is deleted.
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
Suggested Courses

972 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question