Link to home
Start Free TrialLog in
Avatar of thyros
thyros

asked on

Regex to Extract First 3 Whole Words from String in .NET

I have a dynamic variable value which I want to truncate to the first 3 whole words.  The full string is output using a code snippet like this:

<%=RenderContextTag() %>

Open in new window


So for example, lets say the value of this string is:  Wella High Hair Crystal Styler

In perl, using the following regular expression match successfully pulls out the first 3 whole words:

((?:\W*\w+){0,3}).*

Open in new window


$1 = Wella High Hair

However, when I tried to adapt this to an aspx .net page it was giving syntax errors:

<%=Regex.Replace(RenderContextTag(),"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out.  What is the proper syntax to extract the first 3 whole words from a variable string?
Avatar of HainKurt
HainKurt
Flag of Canada image

check this code

looks like tehre are a number of Regex in the system which contradict with each other, use fully qualified namespace and class
Imports System.Text.RegularExpressions.Regex

Partial Class Regex1
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim pattern As String = "((?:\W*\w+){0,3}).*"
        Dim RenderContextTag As String = "Wella High Hair at EE by Hain Kurt!"
        Dim rv As String

        rv = System.Text.RegularExpressions.Regex.Replace(RenderContextTag, pattern, "$1")
        Response.Write(rv)
    End Sub
End Class

Open in new window

or use this

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

and add

Imports System.Text.RegularExpressions.Regex

to code behind...
Avatar of thyros
thyros

ASKER

I don't have access to the code behind, but when trying to use your inline code example it gives this error:

Compiler Error Message: CS1009: Unrecognized escape sequence

and it highlights this line:

Line 264:<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
post the code you have

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

should work, tested locally... maybe you are trying to use this inside some server controls...

Avatar of kaufmed
HainKurt gave you VB code and you need  C#    = )

In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
<%=Regex.Replace(RenderContextTag(),"((?:\\W*\\w+){0,3}).*", "$1")%>

--OR--

<%=Regex.Replace(RenderContextTag(), @"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window

I didn't read far enough down  : )

What if you say "not a space followed by a space"?
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+ ){0,3}).*", "$1")%>

Open in new window

SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of thyros

ASKER

Kaufmed, your most recent code snippet works nicely in the .NET C# sharp page

<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1"))%>

Open in new window


With the server url encoding, I notice there is a trailing space / + character, i.e.

Wella High Hair Crystal Styler
becomes
Wella+High+Hair+

Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space.  Could you work your sorcery on that? :)
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You may also want to switch back to "word" characters since "not a space" can encompass, well, everything not a space! You would need to examine your data and make that call : )

To be clear, I am saying:

((?:\w+(?: |$)){0,3}).*
Avatar of thyros

ASKER

HainKurt and Kaufmed thank so you much for your help, this is really helpful and I am very grateful, thank you.
NP. Glad to help  = )