Solved

PHP function to remove special characters

Posted on 2006-11-28
8
17,127 Views
Last Modified: 2011-08-18
Hello,

I would like a php function to remove all "special characters" but leave all letters.

The problem is that I need to support European languages, many which use letters beyond the 26 that English uses.


These should be REMOVED

< > ( ) ! # $ % ^ & = + ~ ` * " ' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® ™ ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷

But these should NOT be removed

à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ


There are many more that SHOULD be removed and many more that SHOULD NOT be removed, but the idea is if it can be used any any European language as part of a word I want to keep it.  If not, I want to get rid of it.

Any ideas?

Thanks!
0
Comment
Question by:hankknight
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 11

Assisted Solution

by:ch2
ch2 earned 100 total points
ID: 18030404
You can use a preg_replace with an array with all chars you want to remove.
0
 
LVL 29

Expert Comment

by:TeRReF
ID: 18030465
How about:
<?php

$s = '< > ( ) ! # $ % ^ & = + ~ ` * " \' ¡ ¤ ¢ £ ¥ ¦ § ¨ © ª «  ¬ ­ ® . ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ × ÷ à á â ã ä å é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ';
$s = preg_replace('/[<>()!#$%\^&=+~`*"\'¡¤¢£¥¦§¨©ª«¬­®\.¯°±²³´µ¶·¸¹º»¼½¾¿×÷]/', '', $s);
print($s);

?>
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030667
Thank you both for your ideas.

The problem is that both of you "blacklisted" certain characters.

That is a problem because certain things can fall through the cracks and be missed.

Theses characters for example were missed in the examples:
             ? | {

The only characters that I want to allow are found in the ISO 8859-1 character set in these ranges:
     48 through 57
     65 through 90
     97 through 122
     192 through 246

See:
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
0
Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

 
LVL 29

Assisted Solution

by:TeRReF
TeRReF earned 150 total points
ID: 18030732
Then turn it around and delete all chars that are not in the whitelist:
$s = preg_replace('/[^a-z0-9]/is', '', $s);

Add the allowed chars to this list:
a-z0-9
0
 
LVL 11

Assisted Solution

by:ch2
ch2 earned 100 total points
ID: 18030749
the array:

$ar = array(chr(48), chr(49), chr(50), chr(51), chr(52), chr(53), chr(54), chr(55), chr(56), chr(57), chr(65), chr(66), chr(67), chr(68), chr(69), chr(70), chr(71), chr(72), chr(73), chr(74), chr(75), chr(76), chr(77), chr(78), chr(79), chr(80), chr(81), chr(82), chr(83), chr(84), chr(85), chr(86), chr(87), chr(88), chr(89), chr(90), chr(87), chr(88), chr(89), chr(90), chr(91), chr(92), chr(93), chr(94), chr(95), chr(96), chr(97), chr(98), chr(99), chr(100), chr(101), chr(102), chr(103), chr(104), chr(105), chr(106), chr(107), chr(108), chr(109), chr(110), chr(111), chr(112), chr(113), chr(114), chr(115), chr(116), chr(117), chr(118), chr(119), chr(120), chr(121), chr(122), chr(192), chr(193), chr(194), chr(195), chr(196), chr(197), chr(198), chr(199), chr(200), chr(201), chr(202), chr(203), chr(204), chr(205), chr(206), chr(207), chr(208), chr(209), chr(210), chr(211), chr(212), chr(213), chr(214), chr(215), chr(216), chr(217), chr(218), chr(219), chr(220), chr(221), chr(222), chr(223), chr(224), chr(225), chr(226), chr(227), chr(228), chr(229), chr(230), chr(231), chr(232), chr(233), chr(234), chr(235), chr(236), chr(237), chr(238), chr(239), chr(240), chr(241), chr(242), chr(243), chr(244), chr(245), chr(246));
0
 
LVL 16

Author Comment

by:hankknight
ID: 18030842
Would this work?

$s = preg_replace('/[^a-z0-9À-öø-ÿ]/is', '', $s);

0
 
LVL 29

Assisted Solution

by:TeRReF
TeRReF earned 150 total points
ID: 18030870
Yes, it will, but I guess you want to keep spaces as well right? just add it to the list...
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 250 total points
ID: 18035349
If you know the characters you can convert the values to hex and use the hex-values in the regular expression:
/*
  48 through 57
  65 through 90
  97 through 122
 192 through 246
*/
echo preg_replace('/[^\x30-\x39\x41-\x5a\x61-\x7a\xc0-\xf6]/', '', $text) ."\n";

If you want to keep additional chars just add the exvalues to the list. (it's basicly the same as hankknight wrote but might be easyier to edit and readable

0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question