Solved

regular expression, pick up the last occurrance

Posted on 2006-11-21
6
351 Views
Last Modified: 2010-04-16
Hi,

How do I pick up the last occurrance of a matching string by regular expression. For example: I have string:

<a href=url1><a href=url2>......<end>

I want to pick up the last url before "<end>"

I use regular expression pattern: "<a href=(.+?)><end>" and it gives me "url1><a href=url2>......".
0
Comment
Question by:yeshengl
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
6 Comments
 
LVL 96

Expert Comment

by:Bob Learned
ID: 17998952
I find that parsing HTML is far easier than using regular expressions.

Here is a previous VB.NET question that highlights what I mean:

  http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21767542.html

I have equivalent C# code.

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 17999850
Bob, could you post that C# code anyway ?
I'd like to have a play.
I can post a separate Q if you like...topic area of your choice :)
Thanks.
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 50 total points
ID: 18169812
Sorry, I lost track of this one :(

using System;
using System.Collections;
using System.Threading;
using System.Runtime.InteropServices;
using System.Windows.Forms;

public class HtmlAnchor
{
  public string HRef = "";
  public string Class = "";
  public string Text = "";
}

public class HtmlImage
{
  public string Src = "";
}

public class HtmlDocument
{
  private ArrayList _anchors = new ArrayList();
  private ArrayList _images = new ArrayList();
 
  [ComImport(), Guid("0000010c-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersist
  {

    void GetClassID(ref Guid pClassId);
  }
  [ComImport(), Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IPersistStreamInit : IPersist
  {

    new void GetClassID(ref Guid pClassId);

    [PreserveSig()]
    int IsDirty();

    void Load(UCOMIStream pStm);

    void Save(UCOMIStream pStm, [MarshalAs(UnmanagedType.Bool)] bool fClearDirty);

    void GetMaxSize(ref long pCbSize);

    void InitNew();
  }
  private mshtml.HTMLDocument m_document;
  private string m_url = "";

  public HtmlDocument(string url)
  {
    m_url = url;
    Thread thread = new Thread(new ThreadStart(StartGetDocument));
    thread.Start();
    while (m_document == null || m_document.readyState != "complete")
    {
      Application.DoEvents();
    }
    this.FindAnchors(m_document);
  }

  private void StartGetDocument()
  {
    mshtml.HTMLDocument doc = new mshtml.HTMLDocument();
    IPersistStreamInit ips = (IPersistStreamInit)doc;
    ips.InitNew();
    m_document = (mshtml.HTMLDocument)doc.createDocumentFromUrl(m_url, "\0");
  }

  public HtmlDocument(mshtml.HTMLDocument document)
  {
    this.FindAnchors(document);
  }

  private void FindAnchors(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLAnchorElementClass element in document.getElementsByName("a"))
    {
      HtmlAnchor anchor = new HtmlAnchor();
      anchor.HRef = GetAttribute(element, "href");
      anchor.Class = GetAttribute(element, "class");
      anchor.Text = element.innerText;
      _anchors.Add(anchor);
    }
  }

  private void FindImages(mshtml.HTMLDocument document)
  {
    foreach (mshtml.HTMLImgClass element in document.getElementsByName("img"))
    {
      HtmlImage image = new HtmlImage();
      image.Src = GetAttribute(element, "src");
      _images.Add(image);
    }
  }

  private string GetAttribute(mshtml.IHTMLElement element, string attribName)
  {
    if (element.getAttribute(attribName, 0) != null)
    {
      return element.getAttribute(attribName, 0).ToString();
    }
    return "";
  }

  public HtmlAnchor[] Anchors
  {
    get
    {
      return (HtmlAnchor[])(_anchors.ToArray(typeof(HtmlAnchor)));
    }
  }
}

Bob
0
 
LVL 15

Expert Comment

by:ozymandias
ID: 18363488
Thanks for that code, btw :)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question