We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

How to convert special characters in a string to their "normal" counterparts

thomers1
thomers1 asked
on
Medium Priority
805 Views
Last Modified: 2012-06-21
for a profanity filter, i need to convert special characters (e.g. with accents) in a string to "normal" ones, example:

è to e
í to i
ú to u

etc.

how to do this efficiently?
Comment
Watch Question

CERTIFIED EXPERT
Top Expert 2016

Commented:
I would just do
final String find = "èíú";
final String repl = "eiu";
 
// (in loop)
s = s.replaceAll("" + find.charAt[i], "" + repl.charAt[i]);

Open in new window

Author

Commented:
thanks  - hmm i forgot that multiple variations will exist for each "normal" letter

like
Ê                                         Ë È                                          É
should all be replaced with "E" (or converted into lowercase first, and then replaced by "e")

is there something to consider when comparing chars with special characters? (e.g. how do i encode them, if i can't type them on my keyboard).
CERTIFIED EXPERT
Top Expert 2016
Commented:
Multiples don't matter: see below.

The fact that you can't type them doesn't matter, but they must exist in the encoding in which the code resides. You can always use Unicode escapes for untypable ones. Best to see your source code is saved as UTF-8
find = "ÊËÈÉ"
repl = "EEEE"

Open in new window

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Author

Commented:
thanks! :-)
CERTIFIED EXPERT
Top Expert 2016

Commented:
:-)

Author

Commented:
hmm seem i was too fast closing this question

in your example, the lenght of find is 8, while repl is 4 - which means it doesnt work
CERTIFIED EXPERT
Top Expert 2016

Commented:
Well - obviously you need to ensure they're the same length ;-)

Author

Commented:
nope what i meant is, if i use your example from above with 4 characters each, find.length() returns 8, while repl.length() returns 4.




find = "ÊËÈÉ"
repl = "EEEE"

Open in new window

CERTIFIED EXPERT
Top Expert 2016

Commented:
To be on the safe side:
for(int i = 0;i < Math.min(find.length(),repl.length());i++)

Open in new window

Author

Commented:
i think, the first line does not initialize the find string as UTF-8 encoded.

if i iterate over the characters of find, i get this (see code).

obviously, this can't be used to compare to the repl string.




0: 
1: ä
2: 
3: ã
4: 
5: à
6: 
7: â

Open in new window

CERTIFIED EXPERT
Top Expert 2016

Commented:
Can you show me the code you're running? Also, using the following, please tell me the result of passing 'file.encoding' as a parameter to it

http://technojeeves.com/joomla/index.php/free/54-javasystemproperties

Author

Commented:
aah, file.encoding is "MacRoman"



final String prepare_find = "ÊËÈÉ";
final String prepare_repl = "eeee";
 
System.out.println("find: " + prepare_find.length());
System.out.println("repl: " + prepare_repl.length());
 
for (int i=0; i<prepare_find.length(); i++) {
   System.out.println(i + ": " + prepare_find.charAt(i));
}

Open in new window

CERTIFIED EXPERT
Top Expert 2016

Commented:
Ah OK. Refer to the MacRoman chart for your default chars. You might be better to install a full UTF-8 locale in the end

Author

Commented:
You might be better to install a full UTF-8 locale in the end
How do i do that on Mac OSX ?

CERTIFIED EXPERT
Top Expert 2016

Commented:
Don't know i'm afraid. I've never been a Mac user, but i'm assuming its latest incarnations support a UTF-8 environment. Having said that, most of the exotic accented chars should be in MacRoman, since they appear in ISO8859-1:

http://technojeeves.com/joomla/index.php/free/48-iso8859-1
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.