This article describes how to remove diacritic marks from characters.
http://www.ahinea.com/en/t
It's for Perl and I need a way to do this in C# ...
Main Topics
Browse All TopicsWhat's the easiest way to remove diacritic marks from characters using C#? I would like to have following function:
string RemoveDiacriticMark(string
Sample use:
RemoveDiacriticMark("é") -> "e"
RemoveDiacriticMark("ü") -> "u"
RemoveDiacriticMark("à") -> "a"
....
Thanks!
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
This article describes how to remove diacritic marks from characters.
http://www.ahinea.com/en/t
It's for Perl and I need a way to do this in C# ...
Well, you could do something like this in C#.
public static string RemoveDiacritics(string input) {
string Return = input;
Return = Regex.Replace(Return, "\xe6", "ae"); // a-umlaut
Return = Regex.Replace(Return, "\xc7", "c"); // c-cedilla
... same for other single matches ...
Return = Regex.Replace(Return, "[\xe0\xe1\xe2\xe3\xe4\xe5
... same for other multiple matches ...
return Return;
}
You'll need to fill in the blanks. You can find the codes for all the characters using Character Map. I would fill them all in myself, but there are a few, and I don't actually need this function myself. ;)
According to http://www.nongnu.org/unac
I guess it has something todo with the ParseCombiningCharacters method of the StringInfo class but I cannot figure out how the missing parts :-(
You are referring to the unac C code that I referred to in my first post. If you download the C source code, you will see that it builds up enormous arrays of hard-coded Unicode values, so I'm afraid there is no way beyond the hardcoding. The ParseCombiningCharacters method is to do with characters that, even with the enormous Unicode character space, are represented by Unicode characters - it is not related to the sort of characters you want to work with.
I've submited the question to Michael Kaplan who's working at Microsoft. He answered the question on his blog:
http://blogs.msdn.com/mich
Conclusion: using Whidbey it's possible to wite a RemoveDiacritics function. in prior versions your options are more limited, through a p/invoke to the FoldString API with the MAP_COMPOSITE flag.
Business Accounts
Answer for Membership
by: muzzy2003Posted on 2005-02-19 at 09:11:51ID: 13354109
There is a C library to do this called unac, but it basically stores all the unicode accented characters and their unaccented equivalents and does the translation on that basis. You could translate this to C#. You can download it from:
/unac-man3 .en.html
http://www.nongnu.org/unac
I don't know of any other way of doing this.