Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Regex to Extract First 3 Whole Words from String in .NET

Posted on 2011-02-17
12
806 Views
Last Modified: 2012-06-27
I have a dynamic variable value which I want to truncate to the first 3 whole words.  The full string is output using a code snippet like this:

<%=RenderContextTag() %>

Open in new window


So for example, lets say the value of this string is:  Wella High Hair Crystal Styler

In perl, using the following regular expression match successfully pulls out the first 3 whole words:

((?:\W*\w+){0,3}).*

Open in new window


$1 = Wella High Hair

However, when I tried to adapt this to an aspx .net page it was giving syntax errors:

<%=Regex.Replace(RenderContextTag(),"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out.  What is the proper syntax to extract the first 3 whole words from a variable string?
0
Comment
Question by:thyros
  • 6
  • 3
  • 3
12 Comments
 
LVL 51

Expert Comment

by:HainKurt
ID: 34918766
check this code

looks like tehre are a number of Regex in the system which contradict with each other, use fully qualified namespace and class
Imports System.Text.RegularExpressions.Regex

Partial Class Regex1
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim pattern As String = "((?:\W*\w+){0,3}).*"
        Dim RenderContextTag As String = "Wella High Hair at EE by Hain Kurt!"
        Dim rv As String

        rv = System.Text.RegularExpressions.Regex.Replace(RenderContextTag, pattern, "$1")
        Response.Write(rv)
    End Sub
End Class

Open in new window

0
 
LVL 51

Expert Comment

by:HainKurt
ID: 34918797
or use this

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

and add

Imports System.Text.RegularExpressions.Regex

to code behind...
0
 

Author Comment

by:thyros
ID: 34919942
I don't have access to the code behind, but when trying to use your inline code example it gives this error:

Compiler Error Message: CS1009: Unrecognized escape sequence

and it highlights this line:

Line 264:<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

Open in new window


I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
LVL 51

Expert Comment

by:HainKurt
ID: 34920003
post the code you have

<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>

should work, tested locally... maybe you are trying to use this inside some server controls...

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34920226
HainKurt gave you VB code and you need  C#    = )

In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
<%=Regex.Replace(RenderContextTag(),"((?:\\W*\\w+){0,3}).*", "$1")%>

--OR--

<%=Regex.Replace(RenderContextTag(), @"((?:\W*\w+){0,3}).*", "$1")%>

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34920262
I didn't read far enough down  : )

What if you say "not a space followed by a space"?
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+ ){0,3}).*", "$1")%>

Open in new window

0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 500 total points
ID: 34920343
I think the last one might need a slight tweak for the event that a string is 3 words or less:
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>

Open in new window

0
 

Author Comment

by:thyros
ID: 34920610
Kaufmed, your most recent code snippet works nicely in the .NET C# sharp page

<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1"))%>

Open in new window


With the server url encoding, I notice there is a trailing space / + character, i.e.

Wella High Hair Crystal Styler
becomes
Wella+High+Hair+

Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space.  Could you work your sorcery on that? :)
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34920699
Throw a Trim() onto the end  = )
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1").Trim())%>

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34920723
You may also want to switch back to "word" characters since "not a space" can encompass, well, everything not a space! You would need to examine your data and make that call : )

To be clear, I am saying:

((?:\w+(?: |$)){0,3}).*
0
 

Author Closing Comment

by:thyros
ID: 34921977
HainKurt and Kaufmed thank so you much for your help, this is really helpful and I am very grateful, thank you.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34922036
NP. Glad to help  = )
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

837 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question