Regex for rewriting url to remove non-ascii characters

Hi,

I have this regular expression in a CMS that allows for urls with  Swedish characters å,ä,ö.

So for example a url for a page might look like this:

en-minnesavärd-äppeldag

Open in new window



and the RegEx used inn the CMS looks like this:
[^\p{L}\-\!\$\(\)\=\@\d_\'\.]+|\.+$

Open in new window


I would like it only to allow ascii so that the url would have to look like this instead:
en-minnesavard-appeldag

Open in new window


How can I alter the RegEx to accomplish that?

Thanks for help!

Peter
Peter NordbergIT ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Terry WoodsIT GuruCommented:
Swap the \p{L} for a-zA-Z :

[^a-zA-Z\-\!\$\(\)\=\@\d_\'\.]+|\.+$
0
ozoCommented:
[^a-zA-Z\-\!\$\(\)\=\@\d_\'\.]+|\.+$
0
Peter NordbergIT ManagerAuthor Commented:
Hi and thanks for answer,

This makes a url  like this:
nya-äppelträd-växer-i-trädgården

Open in new window


to look like this:
nya-ppeltr-d-v-xer-i-tr-dg-rden

Open in new window


Is it possible to get it å, ä to be replaces with a and ö to be raplaced with o?

Peter
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Terry WoodsIT GuruCommented:
Do you mean with a regex, or will a custom function be ok?
0
Peter NordbergIT ManagerAuthor Commented:
With RegEx because I don't know how to impolement it with a custom function in this scenario.

Peter
0
Terry WoodsIT GuruCommented:
Apparently those characters are called "diacritics".

I can't see how to do that with a regex. If you'd like an alternative technique, apparently this works:

string initial = "ÁÂÃÄÅÇÈÉàáâãäåèéêëìíîïòóôõ";
string normal = initial.Normalize(NormalizationForm.FormD);
 
var withoutDiacritics = normal.Where(
    c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
 
string final = new string(withoutDiacritics.ToArray());
 
Console.WriteLine(initial);
Console.WriteLine(final);
 
/* OUTPUT
 
ÁÂÃÄÅÇÈÉàáâãäåèéêëìíîïòóôõ
AAAAACEEaaaaaaeeeeiiiioooo
 
*/

Open in new window

Source: http://www.blackwasp.co.uk/RemoveDiacritics.aspx
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.