Solved

PHP function to remove special characters

Posted on 2006-11-28
8
17,131 Views
Last Modified: 2011-08-18
Hello,

I would like a php function to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.


These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!
0
Comment
Question by:hankknight
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 11

Assisted Solution

by:ch2
ch2 earned 100 total points
ID: 18030404
You can use a preg_replace with an array with all chars you want to remove.
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18030465
How about:
<?php

$s = '< > ( ) ! # $ % ^ & = + ~ ` * " \' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® . ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷ à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ';
$s = preg_replace('/[<>()!#$%\^&=+~`*"\'¡¤¢£¥¦§¨©ª«¬­®\.¯°±²³´µ¶·¸¹º»¼½¾¿×÷]/', '', $s);
print($s);

?>
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030667
Thank you both for your ideas.

The problem is that both of you "blacklisted" certain characters.

That is a problem because certain things can fall through the cracks and be missed.

Theses characters for example were missed in the examples:
             ? | {

The only characters that I want to allow are found in the ISO 8859-1 character set in these ranges:
     48 through 57
     65 through 90
     97 through 122
     192 through 246

See:
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 29

Assisted Solution

by:TeRReF
TeRReF earned 150 total points
ID: 18030732
Then turn it around and delete all chars that are not in the whitelist:
$s = preg_replace('/[^a-z0-9]/is', '', $s);

Add the allowed chars to this list:
a-z0-9
0
 
LVL 11

Assisted Solution

by:ch2
ch2 earned 100 total points
ID: 18030749
the array:

$ar = array(chr(48), chr(49), chr(50), chr(51), chr(52), chr(53), chr(54), chr(55), chr(56), chr(57), chr(65), chr(66), chr(67), chr(68), chr(69), chr(70), chr(71), chr(72), chr(73), chr(74), chr(75), chr(76), chr(77), chr(78), chr(79), chr(80), chr(81), chr(82), chr(83), chr(84), chr(85), chr(86), chr(87), chr(88), chr(89), chr(90), chr(87), chr(88), chr(89), chr(90), chr(91), chr(92), chr(93), chr(94), chr(95), chr(96), chr(97), chr(98), chr(99), chr(100), chr(101), chr(102), chr(103), chr(104), chr(105), chr(106), chr(107), chr(108), chr(109), chr(110), chr(111), chr(112), chr(113), chr(114), chr(115), chr(116), chr(117), chr(118), chr(119), chr(120), chr(121), chr(122), chr(192), chr(193), chr(194), chr(195), chr(196), chr(197), chr(198), chr(199), chr(200), chr(201), chr(202), chr(203), chr(204), chr(205), chr(206), chr(207), chr(208), chr(209), chr(210), chr(211), chr(212), chr(213), chr(214), chr(215), chr(216), chr(217), chr(218), chr(219), chr(220), chr(221), chr(222), chr(223), chr(224), chr(225), chr(226), chr(227), chr(228), chr(229), chr(230), chr(231), chr(232), chr(233), chr(234), chr(235), chr(236), chr(237), chr(238), chr(239), chr(240), chr(241), chr(242), chr(243), chr(244), chr(245), chr(246));
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030842
Would this work?

$s = preg_replace('/[^a-z0-9À-öø-ÿ]/is', '', $s);

0
 
LVL 29

Assisted Solution

by:TeRReF
TeRReF earned 150 total points
ID: 18030870
Yes, it will, but I guess you want to keep spaces as well right? just add it to the list...
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 250 total points
ID: 18035349
If you know the characters you can convert the values to hex and use the hex-values in the regular expression:
/*
  48 through 57
  65 through 90
  97 through 122
 192 through 246
*/
echo preg_replace('/[^\x30-\x39\x41-\x5a\x61-\x7a\xc0-\xf6]/', '', $text) ."\n";

If you want to keep additional chars just add the exvalues to the list. (it's basicly the same as hankknight wrote but might be easyier to edit and readable

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question