Regular Expressions: Match the base letter of a unicode string

Hello experts,

Is it possible to match the base letter of a unicode string?  If so, how do I do it?  So, for example, I have the word "hen" that I am looking for.  In my text file, I could have "hen" (which will match) and I could have "heñ" (which currently does not match).  I would like my regular expression or method thereof to be able to match both words.

So, Is there a regex tactic of which I am not aware that will match the base letter "n" when it comes across the unicode character ñ (and so on for every base letter)?

Thanks for shedding the light.
LVL 7
GewgalaAsked:
Who is Participating?
 
gmrsecsConnect With a Mentor Commented:
I've made some research and I saw that .net string object has Normalize method and you can transform your string before applying reg exp like:

s.Normalize(NormalizationForm.FormD)

It should work(but it is not tested).
0
 
gmrsecsCommented:
basically, if you succeed to normalize your string in canonical mode, but I don't know how to do it in .net, you can use a simple reg exp like :
1)    he\u006E\p{M}*

where \u006E is the 'n' representation in unicode, and \p{M}* 0 or more diacritic signs. so this reg exp will match 'hen', but also heX(where X is a composition between \006E and    a diacritic(eg. \u0301))


anyway, the problem remains the canonical decomposition.
0
 
GewgalaAuthor Commented:
Thank you gmrsecs, that's exactly what I needed.  I applyed the NormalizationForm.FormD to my string, but I then ran a regex after that on the same string that stripped out all diacritic symbols.  So, for example, I ran this:

string s = <contents of file>;
string decoded = s.Normalize(Normalization.FormD);

Regex r = new Regex("\p{M}+", RegexOptions.Compiled);
decoded = r.Replace(decoded, "");

the string variable "decoded" would now contain the exact same content of the string variable "s" except all diacritic symbols would be stripped out, such as all ñ characters become simply n and so on, which I am them able to perform my matches on the decoded string and grab everything that I need.

Thanks!
0
 
GewgalaAuthor Commented:
Thanks again!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.