Link to home
Start Free TrialLog in
Avatar of jitz
jitz

asked on

regEx not parsing correctly

I'm trying to write my own BBCode to HTML converter in VB.Net 2008 using some code I found as a base.  So far everything is working well, but for some reason I can't get RegEx to replace the "Quote" with a "blockquote".

Can someone take a look at this code and see if you can tell me what I'm doing wrong?  This is driving me nuts!

The BBCode looks something like:
[QUOTE=SomeDude]Some example text goes here[/QUOTE]

I used almost the exact same code for the Font Size and Color and it works fine.

I sure hope someone can help!  Thanks!
Public Function BBtoHTML(BBCode as string) as String
  Dim regExp As System.Text.RegularExpressions.Regex
  Dim Ret as string = BBCode
 
  regExp = New Regex("\[QUOTE=([^\]]+)\]([^\]]+)\[\/QUOTE\]")
  Ret = regExp.Replace(Ret, "<blockquote style=""background-color: #CCCCCC; border-width: thin"">" & "Originally Posted by <strong>$1</strong><br /><em>$2</em></blockquote>") 
 
  Return Ret
End Function

Open in new window

Avatar of ddrudik
ddrudik
Flag of United States of America image

I would try:
"\[QUOTE=([^\]]+)\]([^\[]+)\[\/QUOTE\]"
Use nongreedy operator ? and [^\[] instead of [^\]] in the second bracket

\[QUOTE=([^\]]+?)\]([^\[+?)\[\/QUOTE\]

Open in new window

Sorry forgot a bracket

\[QUOTE=([^\]]+?)\]([^\[]+?)\[\/QUOTE\]

Open in new window

To add ? to the pattern would suggest that the [^\]]+ or [^\[]+ might overmatch or otherwise benefit from having ? added, which I don't see that being the case.

The actual issue with the original pattern was that the second ([^\]]+) needed to be ([^\[]+)

Here's the working code with my pattern:
Imports System.Text.RegularExpressions
Module Module1
    Sub Main()
        Console.WriteLine(BBtoHTML("[QUOTE=SomeDude]Some example text goes here[/QUOTE]"))
    End Sub
    Public Function BBtoHTML(ByVal BBCode As String) As String
        Dim regExp As System.Text.RegularExpressions.Regex
        regExp = New Regex("\[QUOTE=([^\]]+)\]([^\[]+)\[\/QUOTE\]")
        Return regExp.Replace(BBCode, "<blockquote style=""background-color: #CCCCCC; border-width: thin"">Originally Posted by <strong>$1</strong><br /><em>$2</em></blockquote>")
    End Function
End Module

Open in new window

Avatar of jitz
jitz

ASKER

Sorry, I was sent out of town for some work.  I'll try these suggestions and get back as soon as I can.

Thanks for all the help guys!
Avatar of jitz

ASKER

I tried the example and it does work, but only if another quote isn't embedded within a quote. Does that make sense?

I've attached my test BBCode and my basic function for the conversion. There are still a few things in the BBCode that I haven't written any code for, but I'm still including it just in case.  Like I said earlier, I found some of this code on the web and have been trying to add to it.  Also is there anyway to use the regex without regard for case?

Heres my test BBCode (kept in an access database, all one string):

[B]The[/B] [FONT="Comic Sans MS"][SIZE="1"]quick[/SIZE][/FONT] [U]brown[/U] [I]fox [COLOR="Red"]jumped[/COLOR] [COLOR="DarkOrchid"]over[/COLOR] the lazy dogs back![/i]
[QUOTE=SomeDude]This is a quote![QUOTE=AnotherDude]This is a quote within a quote[QUOTE=ThirdDude]This is a quote within a quote within a quote[/QUOTE][/QUOTE][/QUOTE]

[IMG]http://upload.wikimedia.org/wikipedia/en/thumb/2/24/Lenna.png/200px-Lenna.png[/IMG]

[URL="http://www.microsoft.com"]Test Link[/URL]

[LIST=1]
[*]Item 1
[*]Item 2
[*]Item 3
[/LIST]

[LIST]
[*]Item 1
[*]Item 2
[*]Item 3
[/LIST]

[INDENT]Indented text[/INDENT]

Thanks!

Public Function ConvertBBCodeToHTML(ByVal BBCode As String) As String
        Dim regExp As Regex
        Dim Ret As String = BBCode
 
        '//Regex for URL tag without anchor
        regExp = New Regex("\[URL\]([^\]]+)\[\/URL\]")
        Ret = regExp.Replace(Ret, "<a href=""$1"">$1</a>")
 
        '//Regex for URL with anchor
        regExp = New Regex("\[URL=([^\]]+)\]([^\]]+)\[\/URL\]")
        Ret = regExp.Replace(Ret, "<a href=""$1"">$2</a>")
 
        '//Image regex
        regExp = New Regex("\[IMG\]([^\]]+)\[\/IMG\]")
        Ret = regExp.Replace(Ret, "<img src=""$1"" />")
 
        '//Bold text
        regExp = New Regex("\[B\](.+?)\[\/B\]")
        Ret = regExp.Replace(Ret, "<b>$1</b>")
 
        '//Italic text
        regExp = New Regex("\[I\](.+?)\[\/I\]")
        Ret = regExp.Replace(Ret, "<i>$1</i>")
 
        '//Underline text
        regExp = New Regex("\[U\](.+?)\[\/U\]")
        Ret = regExp.Replace(Ret, "<u>$1</u>")
 
        '//Font size
        regExp = New Regex("\[SIZE=([^\]]+)\]([^\]]+)\[\/SIZE\]")
        Ret = regExp.Replace(Ret, "<font size=$1>$2</font>")
 
        '//Font name
        regExp = New Regex("\[FONT=([^\]]+)\]([^\]]+)\[\/FONT\]")
        Ret = regExp.Replace(Ret, "<font face=$1>$2</font>")
 
        '//Font color
        regExp = New Regex("\[COLOR=([^\]]+)\]([^\]]+)\[\/COLOR\]")
        Ret = regExp.Replace(Ret, "<font color=$1>$2</font>")
 
        '//Quote
        regExp = New Regex("\[QUOTE=([^\]]+)\]([^\[]+)\[\/QUOTE\]")
        Ret = regExp.Replace(Ret, "<blockquote style=""background-color: #E4E4E4; border-width: thin"">"Originally Posted by <strong>$1</strong><br /><em>$2</em></blockquote>")
 
        Ret = Ret.Replace(vbNewLine, "<br />")
        Return Ret
    End Function

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jitz

ASKER

I couldn't quite figure out how to get the MatchEvaluator to work in my situation, but I'm obviously not the greatest coder anyway.  :)

This may not the right way to do it, but it definately works!  Now I just have to fix a few font issues and such and the "Quote" looks like it's good to go!

Oh, Btw, I found the RegexOptions.IgnoreCase option and that helped tremendously!

Thanks again!
regExp = New Regex("\[QUOTE=([^\]]+)\]([^\[]+)\[\/QUOTE\]", RegexOptions.Multiline Or RegexOptions.IgnoreCase)
 
Do While regExp.IsMatch(Ret) = True
        Ret = regExp.Replace(Ret, "<blockquote style=""background-color: #E4E4E4; border-width: thin"">"Originally Posted by <strong>$1</strong><br /><em>$2</em></blockquote>")
Loop

Open in new window

Thanks for the question and the points.