[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 812
  • Last Modified:

Regex to Extract First 3 Whole Words from String in .NET

I have a dynamic variable value which I want to truncate to the first 3 whole words.  The full string is output using a code snippet like this:

<%=RenderContextTag() %>

Open in new window


So for example, lets say the value of this string is:  Wella High Hair Crystal Styler

In perl, using the following regular expression match successfully pulls out the first 3 whole words:

((?:\W*\w+){0,3}).*

Open in new window


$1 = Wella High Hair

However, when I tried to adapt this to an aspx .net page it was giving syntax errors:

<%=Regex.Replace(RenderContextTag(),"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out.  What is the proper syntax to extract the first 3 whole words from a variable string?
0
thyros
Asked:
thyros
  • 6
  • 3
  • 3
2 Solutions
 
HainKurtSr. System AnalystCommented:
check this code

looks like tehre are a number of Regex in the system which contradict with each other, use fully qualified namespace and class
Imports System.Text.RegularExpressions.Regex

Partial Class Regex1
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim pattern As String = "((?:\W*\w+){0,3}).*"
        Dim RenderContextTag As String = "Wella High Hair at EE by Hain Kurt!"
        Dim rv As String

        rv = System.Text.RegularExpressions.Regex.Replace(RenderContextTag, pattern, "$1")
        Response.Write(rv)
    End Sub
End Class

Open in new window

0
 
HainKurtSr. System AnalystCommented:
or use this

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

and add

Imports System.Text.RegularExpressions.Regex

to code behind...
0
 
thyrosAuthor Commented:
I don't have access to the code behind, but when trying to use your inline code example it gives this error:

Compiler Error Message: CS1009: Unrecognized escape sequence

and it highlights this line:

Line 264:<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
HainKurtSr. System AnalystCommented:
post the code you have

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

should work, tested locally... maybe you are trying to use this inside some server controls...

0
 
käµfm³d 👽Commented:
HainKurt gave you VB code and you need  C#    = )

In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
<%=Regex.Replace(RenderContextTag(),"((?:\\W*\\w+){0,3}).*", "$1")%>

--OR--

<%=Regex.Replace(RenderContextTag(), @"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window

0
 
käµfm³d 👽Commented:
I didn't read far enough down  : )

What if you say "not a space followed by a space"?
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+ ){0,3}).*", "$1")%>

Open in new window

0
 
käµfm³d 👽Commented:
I think the last one might need a slight tweak for the event that a string is 3 words or less:
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>

Open in new window

0
 
thyrosAuthor Commented:
Kaufmed, your most recent code snippet works nicely in the .NET C# sharp page

<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1"))%>

Open in new window


With the server url encoding, I notice there is a trailing space / + character, i.e.

Wella High Hair Crystal Styler
becomes
Wella+High+Hair+

Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space.  Could you work your sorcery on that? :)
0
 
käµfm³d 👽Commented:
Throw a Trim() onto the end  = )
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1").Trim())%>

Open in new window

0
 
käµfm³d 👽Commented:
You may also want to switch back to "word" characters since "not a space" can encompass, well, everything not a space! You would need to examine your data and make that call : )

To be clear, I am saying:

((?:\w+(?: |$)){0,3}).*
0
 
thyrosAuthor Commented:
HainKurt and Kaufmed thank so you much for your help, this is really helpful and I am very grateful, thank you.
0
 
käµfm³d 👽Commented:
NP. Glad to help  = )
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 6
  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now