Solved

Remove special characters" but leave all letters (multi-language)

Posted on 2006-11-28
5
1,167 Views
Last Modified: 2008-01-09
Hello,

I would like to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.



These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!

0
Comment
Question by:hankknight
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 19

Accepted Solution

by:
Serena Hsi earned 200 total points
ID: 18029041
Well, depending on what software you're using to view/edit the text, all the ASCII characters have a code assigned to them.

This page has a list of both the standard ASCII list and extended ASCII characters.
http://office.microsoft.com/en-us/help/HA011331361033.aspx
0
 
LVL 6

Assisted Solution

by:Basilisci
Basilisci earned 200 total points
ID: 18029299
In Java this is easy with java.lang.Character.isLetter()

with Javascript, it is a bit trickier.

var newstr = "";
// loop each character
for (var i = 0; i < str.length; i++) {
  // get unicode code for character
  var c = str.charCodeAt(i);
 
  // here some code to block unicode ranges, this is just an example
  if (c < 30) {
    continue;
  }
  newstr += str.charAt(i);
}

You should then find some unicode reference to look up the ranges for valid letters. They are usually in continuous blocks, so that should not be too hard.

You could also try some Regular Expression syntax, but I'm not sure what.
0
 
LVL 3

Assisted Solution

by:FreakTrap
FreakTrap earned 100 total points
ID: 18029344
<?php

$string = "Hello%$#@ world";
str_replace(array("$", "%", "@", "#"), "", $string);
echo $string;

//Should echo 'Hello world'

?>
0
 
LVL 6

Assisted Solution

by:Basilisci
Basilisci earned 200 total points
ID: 18029419
If you choose the "heavy, but bulletproof" unicode approach, you can use the official unicode reference to find the ranges for real letters (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt).

The third colum in the CSV file tells the type, if it is "Lu" (uppercase letter) or "Li" (lowercase letter), you are safe. More info at http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values

This is propably the approach that is used in Java's isLetter implementation.
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030178
Thanks
0

Featured Post

Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question