Solved

Regex to Extract First 3 Whole Words from String in .NET

Posted on 2011-02-17
12
801 Views
Last Modified: 2012-06-27
I have a dynamic variable value which I want to truncate to the first 3 whole words.  The full string is output using a code snippet like this:

<%=RenderContextTag() %>

Open in new window


So for example, lets say the value of this string is:  Wella High Hair Crystal Styler

In perl, using the following regular expression match successfully pulls out the first 3 whole words:

((?:\W*\w+){0,3}).*

Open in new window


$1 = Wella High Hair

However, when I tried to adapt this to an aspx .net page it was giving syntax errors:

<%=Regex.Replace(RenderContextTag(),"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out.  What is the proper syntax to extract the first 3 whole words from a variable string?
0
Comment
Question by:thyros
  • 6
  • 3
  • 3
12 Comments
 
LVL 51

Expert Comment

by:HainKurt
ID: 34918766
check this code

looks like tehre are a number of Regex in the system which contradict with each other, use fully qualified namespace and class
Imports System.Text.RegularExpressions.Regex

Partial Class Regex1
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim pattern As String = "((?:\W*\w+){0,3}).*"
        Dim RenderContextTag As String = "Wella High Hair at EE by Hain Kurt!"
        Dim rv As String

        rv = System.Text.RegularExpressions.Regex.Replace(RenderContextTag, pattern, "$1")
        Response.Write(rv)
    End Sub
End Class

Open in new window

0
 
LVL 51

Expert Comment

by:HainKurt
ID: 34918797
or use this

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

and add

Imports System.Text.RegularExpressions.Regex

to code behind...
0
 

Author Comment

by:thyros
ID: 34919942
I don't have access to the code behind, but when trying to use your inline code example it gives this error:

Compiler Error Message: CS1009: Unrecognized escape sequence

and it highlights this line:

Line 264:<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
0
 
LVL 51

Expert Comment

by:HainKurt
ID: 34920003
post the code you have

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

should work, tested locally... maybe you are trying to use this inside some server controls...

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34920226
HainKurt gave you VB code and you need  C#    = )

In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
<%=Regex.Replace(RenderContextTag(),"((?:\\W*\\w+){0,3}).*", "$1")%>

--OR--

<%=Regex.Replace(RenderContextTag(), @"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34920262
I didn't read far enough down  : )

What if you say "not a space followed by a space"?
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+ ){0,3}).*", "$1")%>

Open in new window

0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 500 total points
ID: 34920343
I think the last one might need a slight tweak for the event that a string is 3 words or less:
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>

Open in new window

0
 

Author Comment

by:thyros
ID: 34920610
Kaufmed, your most recent code snippet works nicely in the .NET C# sharp page

<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1"))%>

Open in new window


With the server url encoding, I notice there is a trailing space / + character, i.e.

Wella High Hair Crystal Styler
becomes
Wella+High+Hair+

Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space.  Could you work your sorcery on that? :)
0
 
LVL 74

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34920699
Throw a Trim() onto the end  = )
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1").Trim())%>

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34920723
You may also want to switch back to "word" characters since "not a space" can encompass, well, everything not a space! You would need to examine your data and make that call : )

To be clear, I am saying:

((?:\w+(?: |$)){0,3}).*
0
 

Author Closing Comment

by:thyros
ID: 34921977
HainKurt and Kaufmed thank so you much for your help, this is really helpful and I am very grateful, thank you.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34922036
NP. Glad to help  = )
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now