?
Solved

Remove special characters" but leave all letters (multi-language)

Posted on 2006-11-28
5
Medium Priority
?
1,169 Views
Last Modified: 2008-01-09
Hello,

I would like to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.



These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!

0
Comment
Question by:hankknight
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 19

Accepted Solution

by:
Serena Hsi earned 800 total points
ID: 18029041
Well, depending on what software you're using to view/edit the text, all the ASCII characters have a code assigned to them.

This page has a list of both the standard ASCII list and extended ASCII characters.
http://office.microsoft.com/en-us/help/HA011331361033.aspx
0
 
LVL 6

Assisted Solution

by:Basilisci
Basilisci earned 800 total points
ID: 18029299
In Java this is easy with java.lang.Character.isLetter()

with Javascript, it is a bit trickier.

var newstr = "";
// loop each character
for (var i = 0; i < str.length; i++) {
  // get unicode code for character
  var c = str.charCodeAt(i);
 
  // here some code to block unicode ranges, this is just an example
  if (c < 30) {
    continue;
  }
  newstr += str.charAt(i);
}

You should then find some unicode reference to look up the ranges for valid letters. They are usually in continuous blocks, so that should not be too hard.

You could also try some Regular Expression syntax, but I'm not sure what.
0
 
LVL 3

Assisted Solution

by:FreakTrap
FreakTrap earned 400 total points
ID: 18029344
<?php

$string = "Hello%$#@ world";
str_replace(array("$", "%", "@", "#"), "", $string);
echo $string;

//Should echo 'Hello world'

?>
0
 
LVL 6

Assisted Solution

by:Basilisci
Basilisci earned 800 total points
ID: 18029419
If you choose the "heavy, but bulletproof" unicode approach, you can use the official unicode reference to find the ranges for real letters (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt).

The third colum in the CSV file tells the type, if it is "Lu" (uppercase letter) or "Li" (lowercase letter), you are safe. More info at http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values

This is propably the approach that is used in Java's isLetter implementation.
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030178
Thanks
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
Developer portfolios can be a bit of an enigma—how do you present yourself to employers without burying them in lines of code?  A modern portfolio is more than just work samples, it’s also a statement of how you work.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video teaches users how to migrate an existing Wordpress website to a new domain.
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question