Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 17176
  • Last Modified:

PHP function to remove special characters

Hello,

I would like a php function to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.


These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!
0
hankknight
Asked:
hankknight
  • 3
  • 2
  • 2
  • +1
5 Solutions
 
ch2Commented:
You can use a preg_replace with an array with all chars you want to remove.
0
 
TeRReFCommented:
How about:
<?php

$s = '< > ( ) ! # $ % ^ & = + ~ ` * " \' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® . ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷ à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ';
$s = preg_replace('/[<>()!#$%\^&=+~`*"\'¡¤¢£¥¦§¨©ª«¬­®\.¯°±²³´µ¶·¸¹º»¼½¾¿×÷]/', '', $s);
print($s);

?>
0
 
hankknightAuthor Commented:
Thank you both for your ideas.

The problem is that both of you "blacklisted" certain characters.

That is a problem because certain things can fall through the cracks and be missed.

Theses characters for example were missed in the examples:
             ? | {

The only characters that I want to allow are found in the ISO 8859-1 character set in these ranges:
     48 through 57
     65 through 90
     97 through 122
     192 through 246

See:
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
TeRReFCommented:
Then turn it around and delete all chars that are not in the whitelist:
$s = preg_replace('/[^a-z0-9]/is', '', $s);

Add the allowed chars to this list:
a-z0-9
0
 
ch2Commented:
the array:

$ar = array(chr(48), chr(49), chr(50), chr(51), chr(52), chr(53), chr(54), chr(55), chr(56), chr(57), chr(65), chr(66), chr(67), chr(68), chr(69), chr(70), chr(71), chr(72), chr(73), chr(74), chr(75), chr(76), chr(77), chr(78), chr(79), chr(80), chr(81), chr(82), chr(83), chr(84), chr(85), chr(86), chr(87), chr(88), chr(89), chr(90), chr(87), chr(88), chr(89), chr(90), chr(91), chr(92), chr(93), chr(94), chr(95), chr(96), chr(97), chr(98), chr(99), chr(100), chr(101), chr(102), chr(103), chr(104), chr(105), chr(106), chr(107), chr(108), chr(109), chr(110), chr(111), chr(112), chr(113), chr(114), chr(115), chr(116), chr(117), chr(118), chr(119), chr(120), chr(121), chr(122), chr(192), chr(193), chr(194), chr(195), chr(196), chr(197), chr(198), chr(199), chr(200), chr(201), chr(202), chr(203), chr(204), chr(205), chr(206), chr(207), chr(208), chr(209), chr(210), chr(211), chr(212), chr(213), chr(214), chr(215), chr(216), chr(217), chr(218), chr(219), chr(220), chr(221), chr(222), chr(223), chr(224), chr(225), chr(226), chr(227), chr(228), chr(229), chr(230), chr(231), chr(232), chr(233), chr(234), chr(235), chr(236), chr(237), chr(238), chr(239), chr(240), chr(241), chr(242), chr(243), chr(244), chr(245), chr(246));
0
 
hankknightAuthor Commented:
Would this work?

$s = preg_replace('/[^a-z0-9À-öø-ÿ]/is', '', $s);

0
 
TeRReFCommented:
Yes, it will, but I guess you want to keep spaces as well right? just add it to the list...
0
 
hernst42Commented:
If you know the characters you can convert the values to hex and use the hex-values in the regular expression:
/*
  48 through 57
  65 through 90
  97 through 122
 192 through 246
*/
echo preg_replace('/[^\x30-\x39\x41-\x5a\x61-\x7a\xc0-\xf6]/', '', $text) ."\n";

If you want to keep additional chars just add the exvalues to the list. (it's basicly the same as hankknight wrote but might be easyier to edit and readable

0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 3
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now