Link to home
Start Free TrialLog in
Avatar of Mutsop
MutsopFlag for Belgium

asked on

regex expression vb.net

Hi,

I would need to design a regex expression for my program.
So my program retrieves data from a webpage, which works :D

but offcourse only a partial of the source code is needed.

Let's say this is a part from the whole source:
<div class="lijst"><div class="kop"><h3>Zoekresultaat</h3></div><div class="oms"><ul>
<li><a href="/uwzaakleiden/graydon_zoek.jsp?btw=855104785">BOLLAERT MAURICE</a>
<div>
Ondernemingsnummer: BE0855104785
<br>Adres: LAARNESTEENWEG 102 - 9230 WETTEREN
<br>Juridische vorm: Eénmanszaak
</div>
</li></ul></div></div>

Open in new window


I would need:

BOLLAERT MAURICE
Ondernemingsnummer: BE0855104785
Adres: LAARNESTEENWEG 102 - 9230 WETTEREN
Juridische vorm: Eénmanszaak

So as you noticed, all without the <br>-tags. The thing is also, as you might have noticed, it's a list. So i just need the results from the first <li>.

Anyone can help me on this?
Avatar of Didier Vx
Didier Vx
Flag of France image

If the html format is always the same, you can use string parsing (that I successfully use in many projects).

A very good online regexp builder is located at :
http://myregexp.com/
Avatar of Mutsop

ASKER

Well I'm no expert in regular expression, and hoped someone could help me solve the actual code.
Or at least a start of how to get between the li tags.
ASKER CERTIFIED SOLUTION
Avatar of tolgaong
tolgaong
Flag of Türkiye image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Try this:
Dim source As String = "<div class=""lijst""><div class=""kop""><h3>Zoekresultaat</h3></div><div class=""oms""><ul>" & _
                        "<li><a href=""/uwzaakleiden/graydon_zoek.jsp?btw=855104785"">BOLLAERT MAURICE</a>" & _
                        "<div>" & _
                        "Ondernemingsnummer: BE0855104785()" & _
                        "<br>Adres: LAARNESTEENWEG 102 - 9230 WETTEREN" & _
                        "<br>Juridische vorm: Eénmanszaak" & _
                        "</div>" & _
                        "</li></ul></div></div>"
Dim matches As MatchCollection = System.Text.RegularExpressions.Regex.Matches(source, "(?<=(?s)<li>.*?>)[^<]+(?=(?s)<.*?</li>)")

For Each item As Match In matches
    If item.Value.Trim().Length > 0 Then
        MessageBox.Show(item.Value.Trim())
    End If
Next

Open in new window

Avatar of Mutsop

ASKER

that works amazingly well!!!! :D thanks alot.
I totally forgot that for getting the quotes you should set 2 of them after each other :D
Avatar of Mutsop

ASKER

@tolgaong

Sorry I gave the points to the wrong person... I asked a moderator to change it!
You solved your problem this is the main point. And the moderator will solve mine.. :)
Thanks Mutsop
Avatar of Mutsop

ASKER

Thanks yet again and sorry for the inconvenience :)