thyros
asked on
Regex to Extract First 3 Whole Words from String in .NET
I have a dynamic variable value which I want to truncate to the first 3 whole words. The full string is output using a code snippet like this:
So for example, lets say the value of this string is: Wella High Hair Crystal Styler
In perl, using the following regular expression match successfully pulls out the first 3 whole words:
However, when I tried to adapt this to an aspx .net page it was giving syntax errors:
I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out. What is the proper syntax to extract the first 3 whole words from a variable string?
<%=RenderContextTag() %>
So for example, lets say the value of this string is: Wella High Hair Crystal Styler
In perl, using the following regular expression match successfully pulls out the first 3 whole words:
((?:\W*\w+){0,3}).*
$1 = Wella High Hair
However, when I tried to adapt this to an aspx .net page it was giving syntax errors:
<%=Regex.Replace(RenderContextTag(),"((?:\W*\w+){0,3}).*", "$1")%>
I have tried testing it with \b and literal spaces using @ \\w but I can't seem to figure it out. What is the proper syntax to extract the first 3 whole words from a variable string?
or use this
<%=System.Text.RegularExpr essions.Re gex.Replac e("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>
and add
Imports System.Text.RegularExpress ions.Regex
to code behind...
<%=System.Text.RegularExpr
and add
Imports System.Text.RegularExpress
to code behind...
ASKER
I don't have access to the code behind, but when trying to use your inline code example it gives this error:
Compiler Error Message: CS1009: Unrecognized escape sequence
and it highlights this line:
I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
Compiler Error Message: CS1009: Unrecognized escape sequence
and it highlights this line:
Line 264:<%=System.Text.RegularExpressions.Regex.Replace("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>
I have tried adding @\\w and this removes the compiler error message but it doesn't truncate to 3 whole words so there is still something wrong with the syntax.
post the code you have
<%=System.Text.RegularExpr essions.Re gex.Replac e("this will work, by Hain Kurt!", "((?:\W*\w+){0,3}).*", "$1")%>
should work, tested locally... maybe you are trying to use this inside some server controls...
<%=System.Text.RegularExpr
should work, tested locally... maybe you are trying to use this inside some server controls...
HainKurt gave you VB code and you need C# = )
In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
In C#, backslashes are escapes (much like perl and C). Change your code to either of the following:
<%=Regex.Replace(RenderContextTag(),"((?:\\W*\\w+){0,3}).*", "$1")%>
--OR--
<%=Regex.Replace(RenderContextTag(), @"((?:\W*\w+){0,3}).*", "$1")%>
I didn't read far enough down : )
What if you say "not a space followed by a space"?
What if you say "not a space followed by a space"?
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+ ){0,3}).*", "$1")%>
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Kaufmed, your most recent code snippet works nicely in the .NET C# sharp page
With the server url encoding, I notice there is a trailing space / + character, i.e.
Wella High Hair Crystal Styler
becomes
Wella+High+Hair+
Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space. Could you work your sorcery on that? :)
<%=Regex.Replace(RenderContextTag(),"((?:[^ ]+(?: |$)){0,3}).*", "$1")%>
<%=Server.UrlEncode(Regex.Replace(RenderContextTag(), "((?:[^ ]+(?: |$)){0,3}).*", "$1"))%>
With the server url encoding, I notice there is a trailing space / + character, i.e.
Wella High Hair Crystal Styler
becomes
Wella+High+Hair+
Although I think it will still serve my purpose as is, it would probably return better results from the api I am querying if we can strip the trailing space. Could you work your sorcery on that? :)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You may also want to switch back to "word" characters since "not a space" can encompass, well, everything not a space! You would need to examine your data and make that call : )
To be clear, I am saying:
((?:\w+(?: |$)){0,3}).*
To be clear, I am saying:
((?:\w+(?: |$)){0,3}).*
ASKER
HainKurt and Kaufmed thank so you much for your help, this is really helpful and I am very grateful, thank you.
NP. Glad to help = )
looks like tehre are a number of Regex in the system which contradict with each other, use fully qualified namespace and class
Open in new window