regular expressions in .net

Given an array or collection of words as below. I want to find each word in the string text and replace it with say an empty string .



var words = new string[] { "apple", "cat", "red" };

var text = "I have a red apple and a small cat";        
         

            foreach (var w in words) {

                output = Regex.Replace(text, @"\b" + w + @"\b", " ");
                text = output;
             
            }

is there a better way to do this?

also, if I have an array of special characters. If any of the character is found , it will be replaced by an empty string.
In my example below, I have $$$, since it is the same character $ , appearing multiple of times, I would just like to replace my one empty string character,
how do i achieve this?

 Regex reg = new Regex("[*&#$^@{}]");
 var result = reg.Replace("I have * , as well as $$$" , " ");
mikhaAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

wilcoxonCommented:
Your first solution looks reasonable.

For your second question, this should do it:
Regex reg = new Regex("[*&#$^@{}]+");
var result = reg.Replace("I have * , as well as $$$" , " ");

Open in new window

Shaun VermaakTechnical SpecialistCommented:
I would use
Regex reg = new Regex("[^a-zA-Z\d\s:]+");
var result = reg.Replace("I have * , as well as $$$" , " ");

Open in new window


This might as well be a string replace. Do you care about casing?
output = Regex.Replace(text, @"\b" + w + @"\b", " ");

Open in new window

mikhaAuthor Commented:
@shaun- is your first solution looking for all non alphanumeric character?

For second case - case doesn't matter . But I want exact word match .
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

mikhaAuthor Commented:
Also out of curiosity, with regular expression can we figure out if a word or a phrase is within double quotes or not.

Say user inputs - "my name is mikha" vs

My name is mikha .
wilcoxonCommented:
Yes, Shaun's first solution is looking for all non-alphanumeric characters.

I had a thought on how to make your code more efficient for the words.  I don't know .net so you'll likely need to change my added line of code to make it valid .net code.
var words = new string[] { "apple", "cat", "red" };
var text = "I have a red apple and a small cat";         
var rx = "\b(" + words.join("|") + ")\b";  // should end up with \b(apple|cat|red)\b
output = Regex.Replace(text, @rx, " ");

Open in new window


It should be significantly more efficient as the list of words gets longer (unless you have so many words it blows a buffer).

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mikhaAuthor Commented:
@wilcoxon - thanks again . I have only 15-20 reserved words and few special characters that I am looking for.

So I think , even a for loop is good enough .

The buffer outage you mention , I'm guessing is because of how regex looks for pattern and when using | operator , it has to scan the input string multiple times .
wilcoxonCommented:
I was more talking about blowing string length (or regex length) if there were a ton of words that you threw together as word1|word2|...|wordX.

With 15-20, the single regex should be more efficient but I would be surprised if it was wall-clock measurable compared to the loop (depends on .net internals though).  I'd probably test each against a large sample of text (if there is such a thing - it's unclear where text is coming from exactly) and see if it makes a difference.  However, if you are short on time, the for loop should be fine.
mikhaAuthor Commented:
@thank again .

I tested this like you had mentioned, with joining words by | operator as such
 Word1 | word2 ...

With about 20 words and it works fine .

The only thing I'm concerned is that this text, I'm parsing is a user input. Right now there is no limit on the number of characters they type in , but would it make sense to put a limit , like say 500 characters or something like that
wilcoxonCommented:
If it's user input, it should be fine.  I would not expect there to be issues until text is at least a couple gigabytes in size.

If it is user input, just make sure it is sanitized if it is ever used as anything other than a string (eg used as a regular expression, used to update a database, etc).
mikhaAuthor Commented:
Thank you both for your insights
mikhaAuthor Commented:
@wilcoxon - thanks again.
wilcoxonCommented:
You're welcome.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.