Solved

PHP function to remove special characters

Posted on 2006-11-28
8
17,107 Views
Last Modified: 2011-08-18
Hello,

I would like a php function to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.


These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!
0
Comment
Question by:hankknight
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 11

Assisted Solution

by:ch2
ch2 earned 100 total points
ID: 18030404
You can use a preg_replace with an array with all chars you want to remove.
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18030465
How about:
<?php

$s = '< > ( ) ! # $ % ^ & = + ~ ` * " \' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® . ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷ à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ';
$s = preg_replace('/[<>()!#$%\^&=+~`*"\'¡¤¢£¥¦§¨©ª«¬­®\.¯°±²³´µ¶·¸¹º»¼½¾¿×÷]/', '', $s);
print($s);

?>
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030667
Thank you both for your ideas.

The problem is that both of you "blacklisted" certain characters.

That is a problem because certain things can fall through the cracks and be missed.

Theses characters for example were missed in the examples:
             ? | {

The only characters that I want to allow are found in the ISO 8859-1 character set in these ranges:
     48 through 57
     65 through 90
     97 through 122
     192 through 246

See:
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
0
 
LVL 29

Assisted Solution

by:TeRReF
TeRReF earned 150 total points
ID: 18030732
Then turn it around and delete all chars that are not in the whitelist:
$s = preg_replace('/[^a-z0-9]/is', '', $s);

Add the allowed chars to this list:
a-z0-9
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 11

Assisted Solution

by:ch2
ch2 earned 100 total points
ID: 18030749
the array:

$ar = array(chr(48), chr(49), chr(50), chr(51), chr(52), chr(53), chr(54), chr(55), chr(56), chr(57), chr(65), chr(66), chr(67), chr(68), chr(69), chr(70), chr(71), chr(72), chr(73), chr(74), chr(75), chr(76), chr(77), chr(78), chr(79), chr(80), chr(81), chr(82), chr(83), chr(84), chr(85), chr(86), chr(87), chr(88), chr(89), chr(90), chr(87), chr(88), chr(89), chr(90), chr(91), chr(92), chr(93), chr(94), chr(95), chr(96), chr(97), chr(98), chr(99), chr(100), chr(101), chr(102), chr(103), chr(104), chr(105), chr(106), chr(107), chr(108), chr(109), chr(110), chr(111), chr(112), chr(113), chr(114), chr(115), chr(116), chr(117), chr(118), chr(119), chr(120), chr(121), chr(122), chr(192), chr(193), chr(194), chr(195), chr(196), chr(197), chr(198), chr(199), chr(200), chr(201), chr(202), chr(203), chr(204), chr(205), chr(206), chr(207), chr(208), chr(209), chr(210), chr(211), chr(212), chr(213), chr(214), chr(215), chr(216), chr(217), chr(218), chr(219), chr(220), chr(221), chr(222), chr(223), chr(224), chr(225), chr(226), chr(227), chr(228), chr(229), chr(230), chr(231), chr(232), chr(233), chr(234), chr(235), chr(236), chr(237), chr(238), chr(239), chr(240), chr(241), chr(242), chr(243), chr(244), chr(245), chr(246));
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030842
Would this work?

$s = preg_replace('/[^a-z0-9À-öø-ÿ]/is', '', $s);

0
 
LVL 29

Assisted Solution

by:TeRReF
TeRReF earned 150 total points
ID: 18030870
Yes, it will, but I guess you want to keep spaces as well right? just add it to the list...
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 250 total points
ID: 18035349
If you know the characters you can convert the values to hex and use the hex-values in the regular expression:
/*
  48 through 57
  65 through 90
  97 through 122
 192 through 246
*/
echo preg_replace('/[^\x30-\x39\x41-\x5a\x61-\x7a\xc0-\xf6]/', '', $text) ."\n";

If you want to keep additional chars just add the exvalues to the list. (it's basicly the same as hankknight wrote but might be easyier to edit and readable

0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
This article discusses how to create an extensible mechanism for linked drop downs.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now