Link to home
Start Free TrialLog in
Avatar of liljegren
liljegrenFlag for Sweden

asked on

General algoritm for whitespace normalization

Hi all. I'm often having a need for removing whitespace in strings the following way:

1. Trim left and right
2. Replace all tabs (0x9) and linebreaks (0xA + 0xD) with spaces (0x20)
3. Turn all sequences of spaces to just one space

The third step is always a problem. The solutions I've been able to come up with are very cumbersome. But this must be a common problem, so I hope there's a general algoritm for this. Show me, please.
ASKER CERTIFIED SOLUTION
Avatar of AlexFM
AlexFM

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Use regular expressions. They are very strong in text matching and manipulating.

string result = Regex.Replace(input, @"((?<=\S)(?<1>(\s))\s*(?=\S))|(\s*)", "$1",  RegexOptions.Multiline | RegexOptions.ExplicitCapture);

Matches every first whitespace of whitespaces which have a non whitespace to the left and to the right.