Regex to Extract First 3 Whole Words from String in .NET

I have a dynamic variable value which I want to truncate to the first 3 whole words.  The full string is output using a code snippet like this:

<%=RenderContextTag() %>

Open in new window


So for example, lets say the value of this string is:  Wella High Hair Crystal Styler

In perl, using the following regular expression match successfully pulls out the first 3 whole words:

((?:\W*\w+){0,3}).*

Open in new window


$1 = Wella High Hair

However, when I tried to adapt this to an aspx .net page it was giving syntax errors:

<%=Regex.Replace(RenderContextTag(),"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out.  What is the proper syntax to extract the first 3 whole words from a variable string?
thyrosAsked:
Who is Participating?
 
käµfm³d 👽Commented:
Throw a Trim() onto the end  = )
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1").Trim())%>

Open in new window

0
 
HainKurtSr. System AnalystCommented:
check this code

looks like tehre are a number of Regex in the system which contradict with each other, use fully qualified namespace and class
Imports System.Text.RegularExpressions.Regex

Partial Class Regex1
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim pattern As String = "((?:\W*\w+){0,3}).*"
        Dim RenderContextTag As String = "Wella High Hair at EE by Hain Kurt!"
        Dim rv As String

        rv = System.Text.RegularExpressions.Regex.Replace(RenderContextTag, pattern, "$1")
        Response.Write(rv)
    End Sub
End Class

Open in new window

0
 
HainKurtSr. System AnalystCommented:
or use this

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

and add

Imports System.Text.RegularExpressions.Regex

to code behind...
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
thyrosAuthor Commented:
I don't have access to the code behind, but when trying to use your inline code example it gives this error:

Compiler Error Message: CS1009: Unrecognized escape sequence

and it highlights this line:

Line 264:<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
0
 
HainKurtSr. System AnalystCommented:
post the code you have

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

should work, tested locally... maybe you are trying to use this inside some server controls...

0
 
käµfm³d 👽Commented:
HainKurt gave you VB code and you need  C#    = )

In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
<%=Regex.Replace(RenderContextTag(),"((?:\\W*\\w+){0,3}).*", "$1")%>

--OR--

<%=Regex.Replace(RenderContextTag(), @"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window

0
 
käµfm³d 👽Commented:
I didn't read far enough down  : )

What if you say "not a space followed by a space"?
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+ ){0,3}).*", "$1")%>

Open in new window

0
 
käµfm³d 👽Commented:
I think the last one might need a slight tweak for the event that a string is 3 words or less:
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>

Open in new window

0
 
thyrosAuthor Commented:
Kaufmed, your most recent code snippet works nicely in the .NET C# sharp page

<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1"))%>

Open in new window


With the server url encoding, I notice there is a trailing space / + character, i.e.

Wella High Hair Crystal Styler
becomes
Wella+High+Hair+

Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space.  Could you work your sorcery on that? :)
0
 
käµfm³d 👽Commented:
You may also want to switch back to "word" characters since "not a space" can encompass, well, everything not a space! You would need to examine your data and make that call : )

To be clear, I am saying:

((?:\w+(?: |$)){0,3}).*
0
 
thyrosAuthor Commented:
HainKurt and Kaufmed thank so you much for your help, this is really helpful and I am very grateful, thank you.
0
 
käµfm³d 👽Commented:
NP. Glad to help  = )
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.