C# Regex Remove text in middle of string

I have a string of the form
2009-04-18 01:02:03 word1 word2 word3 leave this text alone

I'd like to remove "word1 word2 word3".  This would be easy if those words were at the beginning or end of the string, because I could use start-of-line or end-of-line anchors.

Is there a technique for finding a location, such as the start of word1, in a string and using then that location as one would use an anchor?

Do I have the wrong idea and should be going at it a different way?

LVL 13
Who is Participating?
abelConnect With a Mentor Commented:
> Could you clarify me for?

yes, I can. If you work with regular expressions, then each opening parenthesis introduces a new group grouping expression and thus a new group. Simply put, counting the opening parentheses from left to right will show you which $x you need.

then: $0 means "everything that matches" which is handy when your match does not match the whole string but only parts.

then: (?:somesub-expression-here) introduces non-capturing groups, these do not count into a $1, $2 etc. The start with (?:

just as a thought or mnemonic. Suppose the expression would've been  (([^ ]+ ){3,8}) which means 3 to 8 words, how would you know what to put in the replacement expression if every caught word would end up in a new $x?

In .NET it is possible to enter the layered grouping expressions through code. And more advanced replacement expressions also exist. But for now, just remember to count the parentheses, it works in almost every regular expression flavor (javascript, java, .net, perl, ruby etc).

-- Abel --
If you mean to say that you do not know beforehand how word1/2/3 are looking like, nor you know anything about how "leave this text alone" looks like, it gets a bit difficult. But assuming that:

  • The word1/2/3 are separated by one ore more spaces
  • the start is a date/time stamp of always the same length
  • after removing word1/2/3 you want one space and not two (as in your example)
then you can use this matching regular expression:

^.{20}([^ ]+ ){3}
and you add a few capturing parentheses and you can use it like this:

string replacedString = Regex.Replace(yourString, "^(.{20})(([^ ]+ ){3})(.*)", "$1$4");

-- Abel --
Just tested it to be sure. The following works:

// replacing a string using a regex
string yourString = "2009-04-18 01:02:03 word1 word2 word3 leave this text alone";
string replacedString = Regex.Replace(yourString, "^(.{20})(([^ ]+ ){3})(.*)", "$1$4");
// replacedString now contains "2009-04-18 01:02:03 leave this text alone"

Open in new window

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

josgoodAuthor Commented:
Thank you!  That works great.

I almost understand.
^(.{20}) collects the first 20 characters as $1
([^ ]+ ){3}) collects three words
(.*) collects the rest of the string as $4

$1$4 concatenates the first 20 characters and the $4 "rest of the string" to form the replacement string.

I'm probably exhibiting a fundamental mis-understanding of something, but I would think that
   ([^ ]+ ){3})
would collect three words as $2, $3, and $4.

Could you clarify me for?
PS: good tutorials and references can be found here: http://regular-expressions.info
josgoodAuthor Commented:
I'm always impressed by simplicity.

A real expert can explain to you, in words you understand, the actual simplicity behind an apparent complexity.  A real expert shows you why things are simple.

Thank you for doing so for me.

It is *so* cool when you understand something that has been a problem for you!

tx a bunch for the nice compliment! :-)
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.