Geoff Millikan
asked on
Changing UTF8 characters to HTML entities in PHP?
I have user data with valid characters (like the long dash, see attached screen shot) that I need to put on a web page. Of course it's showing up funny. Current encoding on the web page is iso-8859-1 but I suppose I can change it to UTF-8.
The options for getting HTML entities out of UTF8 characters baffle me like mb_check_encoding(), utf8_decode(), etc. Some of them require complied in support and I cannot recompile PHP since I'm on Red Hat Linux so please see attached for what's available in php_info();
Basically I need a function that makes the funny characters go away, replaced with the correct expression - the long dash, etc.
ScreenShot001.jpg
php-info.txt
The options for getting HTML entities out of UTF8 characters baffle me like mb_check_encoding(), utf8_decode(), etc. Some of them require complied in support and I cannot recompile PHP since I'm on Red Hat Linux so please see attached for what's available in php_info();
Basically I need a function that makes the funny characters go away, replaced with the correct expression - the long dash, etc.
ScreenShot001.jpg
php-info.txt
ASKER
Come on guys!
MunterMan: I want to encode, not decode!
shobinsun:
> http://www.jainsachin.com/php_manual_tutorial/function.htmlspecialchars.html
> http://in.php.net/htmlentities
htmlspecialcharacters() encodes normal characters, not multibye weird ones.
> http://www.seopher.com/articles/html_character_encoding_with_php_made_easy
I mentioned this one in my question, mb_check_encoding(). This uses mb_convert_encoding() which is great except that it requires complied in support per line below:
http://www.php.net/manual/en/mbstring.installation.php
> http://htmlpurifier.org/docs/enduser-utf8.html
Huh?
MunterMan: I want to encode, not decode!
shobinsun:
> http://www.jainsachin.com/php_manual_tutorial/function.htmlspecialchars.html
> http://in.php.net/htmlentities
htmlspecialcharacters() encodes normal characters, not multibye weird ones.
> http://www.seopher.com/articles/html_character_encoding_with_php_made_easy
I mentioned this one in my question, mb_check_encoding(). This uses mb_convert_encoding() which is great except that it requires complied in support per line below:
http://www.php.net/manual/en/mbstring.installation.php
> http://htmlpurifier.org/docs/enduser-utf8.html
Huh?
ASKER
shell> php -r "echo mb_convert_encoding();"
PHP Fatal error: Call to undefined function mb_convert_encoding() in Command line code on line 1
fffd is not an html special character. It is the unicode replacement character, something that is used for an unknown or unprintable character.
Red Hat uses utf-8 as the default character encoding so I would try setting the webpage encoding to that first. If that does not work, then you can worry about converting strings.
Red Hat uses utf-8 as the default character encoding so I would try setting the webpage encoding to that first. If that does not work, then you can worry about converting strings.
ASKER
shobinsun: urlencode() turns it into more of a mess so it looks like this now in the web browser:: Business+card%97Electronic
MunterMan: Web page HTML is set to UTF-8 now and no change. Output below should that everything is UTF-8.
Would love to have a solution.
MunterMan: Web page HTML is set to UTF-8 now and no change. Output below should that everything is UTF-8.
Would love to have a solution.
==== HTML page ===
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
....
=====HTML Response Header ====
HTTP/1.x 200 OK
Date: Fri, 24 Apr 2009 00:50:53 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.1.6
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Hi,
Then you can use the str_replace() function too along with url_encode().
http://in2.php.net/preg_replace
http://php.about.com/od/advancedphp/ss/php_preg_4.htm
Hope this will help you.
Thanks And Regards
Then you can use the str_replace() function too along with url_encode().
http://in2.php.net/preg_replace
http://php.about.com/od/advancedphp/ss/php_preg_4.htm
Hope this will help you.
Thanks And Regards
ASKER
Turns out RHEL5 can have the multi byte package installed! So I have it up and running but converting t to both these just ends up striping the special characters totally out - it doesn't fill in with a space, just strips the whole character out.
mb_convert_encoding($data, 'ISO-8859-1', 'UTF-8')
mb_convert_encoding($data, 'HTML-ENTITIES', 'UTF-8')
Anyone, thoughts?
mb_convert_encoding($data,
mb_convert_encoding($data,
Anyone, thoughts?
#Redhat support says:
Please install the "php-mbstring" package using "yum install php-mbstring" command to use mb_convert_encoding() function.
ASKER
Currently using preg_replace() below to take out any special characters and replace them with a space. But that's not what I w ant to do. I want to show the UTF-8 characters.
What can't I show UTF8 characters on a web page without them showing up as boxes? Seems like a simple question.
What can't I show UTF8 characters on a web page without them showing up as boxes? Seems like a simple question.
preg_replace("/[^a-zA-Z0-9)(-s]/", " ", $data);
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Hi,
Look at this:
https://www.experts-exchange.com/questions/23713547/PHP-GET-Special-Characters-Encoding-Function.html
Regards.
Look at this:
https://www.experts-exchange.com/questions/23713547/PHP-GET-Special-Characters-Encoding-Function.html
Regards.
Hi,
Also look at this:
http://www.penguin-soft.com/penguin/developer/php/ref.mbstring.html
Hope this will give you the good idea.
Regards
Also look at this:
http://www.penguin-soft.com/penguin/developer/php/ref.mbstring.html
Hope this will give you the good idea.
Regards
Hi
You can use one of the following options:
1º Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>
2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>
Regards
You can use one of the following options:
1º Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>
2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>
Regards
Hi
You can use one of the following options:
1º Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>
2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>
Regards
You can use one of the following options:
1º Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>
2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>
Regards
ASKER
Yes, please close as unsolved. Change page encoding to UTF-8 or ISO-8859-1 doesn't resolve the issue. Implemented own solution of striping out special characters as short term solution.
http://uk2.php.net/manual/en/function.html-entity-decode.php