?
Solved

Remove special characters" but leave all letters (multi-language)

Posted on 2006-11-28
5
Medium Priority
?
1,174 Views
Last Modified: 2008-01-09
Hello,

I would like to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.



These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!

0
Comment
Question by:hankknight
5 Comments
 
LVL 20

Accepted Solution

by:
Serena Hsi earned 800 total points
ID: 18029041
Well, depending on what software you're using to view/edit the text, all the ASCII characters have a code assigned to them.

This page has a list of both the standard ASCII list and extended ASCII characters.
http://office.microsoft.com/en-us/help/HA011331361033.aspx
0
 
LVL 6

Assisted Solution

by:Basilisci
Basilisci earned 800 total points
ID: 18029299
In Java this is easy with java.lang.Character.isLetter()

with Javascript, it is a bit trickier.

var newstr = "";
// loop each character
for (var i = 0; i < str.length; i++) {
  // get unicode code for character
  var c = str.charCodeAt(i);
 
  // here some code to block unicode ranges, this is just an example
  if (c < 30) {
    continue;
  }
  newstr += str.charAt(i);
}

You should then find some unicode reference to look up the ranges for valid letters. They are usually in continuous blocks, so that should not be too hard.

You could also try some Regular Expression syntax, but I'm not sure what.
0
 
LVL 3

Assisted Solution

by:FreakTrap
FreakTrap earned 400 total points
ID: 18029344
<?php

$string = "Hello%$#@ world";
str_replace(array("$", "%", "@", "#"), "", $string);
echo $string;

//Should echo 'Hello world'

?>
0
 
LVL 6

Assisted Solution

by:Basilisci
Basilisci earned 800 total points
ID: 18029419
If you choose the "heavy, but bulletproof" unicode approach, you can use the official unicode reference to find the ranges for real letters (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt).

The third colum in the CSV file tells the type, if it is "Lu" (uppercase letter) or "Li" (lowercase letter), you are safe. More info at http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values

This is propably the approach that is used in Java's isLetter implementation.
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030178
Thanks
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When the s#!t hits the fan, you don’t have time to look up who’s on call, draft emails, call collaborators, or send text messages. An instant chat window is definitely the way to go, especially one like HipChat. HipChat is a true business app. An…
How do you create a user-centered user experience on your website? And what are some things you should consider in the process?
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Suggested Courses
Course of the Month9 days, 8 hours left to enroll

612 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question