Link to home
Start Free TrialLog in
Avatar of Geoff Millikan
Geoff MillikanFlag for United States of America

asked on

Changing UTF8 characters to HTML entities in PHP?

I have user data with valid characters (like the long dash, see attached screen shot) that I need to put on a web page.  Of course it's showing up funny.  Current encoding on the web page is iso-8859-1 but I suppose I can change it to UTF-8.

The options for getting HTML entities out of UTF8 characters baffle me like mb_check_encoding(), utf8_decode(), etc.  Some of them require complied in support and I cannot recompile PHP since I'm on Red Hat Linux so please see attached for what's available in php_info();

Basically I need a function that makes the funny characters go away, replaced with the correct expression - the long dash, etc.
ScreenShot001.jpg
php-info.txt
Avatar of Chris Harte
Chris Harte
Flag of United Kingdom of Great Britain and Northern Ireland image

Avatar of Geoff Millikan

ASKER

Come on guys!

MunterMan: I want to encode, not decode!

shobinsun:

http://www.jainsachin.com/php_manual_tutorial/function.htmlspecialchars.html
http://in.php.net/htmlentities

htmlspecialcharacters() encodes normal characters, not multibye weird ones.

http://www.seopher.com/articles/html_character_encoding_with_php_made_easy

I mentioned this one in my question, mb_check_encoding().  This uses mb_convert_encoding() which is great except that it requires complied in support per line below:

http://www.php.net/manual/en/mbstring.installation.php

http://htmlpurifier.org/docs/enduser-utf8.html

Huh?



shell> php -r "echo mb_convert_encoding();"
 
PHP Fatal error:  Call to undefined function mb_convert_encoding() in Command line code on line 1

Open in new window

fffd is not an html special character. It is the unicode replacement character, something that is used for an unknown or unprintable character.
Red Hat uses utf-8 as the default character encoding so I would try setting the webpage encoding to that first. If that does not work, then you can worry about converting strings.
Hi,

Try with urlencode () function.

http://in.php.net/urlencode

Hope this will help you.

Regards
shobinsun: urlencode() turns it into more of a mess so it looks like this now in the web browser:: Business+card%97Electronic

MunterMan: Web page HTML is set to UTF-8 now and no change.  Output below should that everything is UTF-8.

Would love to have a solution.
==== HTML page ===
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
....
 
=====HTML Response Header ====
HTTP/1.x 200 OK
Date: Fri, 24 Apr 2009 00:50:53 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.1.6
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

Open in new window

Hi,

Then you can use the str_replace() function too along with url_encode().

http://in2.php.net/preg_replace

http://php.about.com/od/advancedphp/ss/php_preg_4.htm


Hope this will help you.

Thanks And Regards
Turns out RHEL5 can have the multi byte package installed!  So I have it up and running but converting t to both these just ends up striping the special characters totally out - it doesn't fill in with a space, just strips the whole character out.

mb_convert_encoding($data, 'ISO-8859-1', 'UTF-8')

mb_convert_encoding($data, 'HTML-ENTITIES', 'UTF-8')

Anyone, thoughts?
#Redhat support says:
Please install the "php-mbstring" package using "yum install php-mbstring" command to use mb_convert_encoding() function.

Open in new window

Currently using preg_replace() below to take out any special characters and replace them with a space.  But that's not what  I w ant to do.  I want to show the UTF-8 characters.

What can't I show UTF8 characters on a web page without them showing up as boxes?  Seems like a simple question.
preg_replace("/[^a-zA-Z0-9)(-s]/", " ", $data);

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of shobinsun
shobinsun
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Also look at this:
http://www.penguin-soft.com/penguin/developer/php/ref.mbstring.html

Hope this will give you the good idea.

Regards
Hi

You can use one of the following options:


1º  Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>

2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>

Regards
Hi

You can use one of the following options:


1º  Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>

2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>

Regards
Yes, please close as unsolved. Change page encoding to UTF-8 or ISO-8859-1 doesn't resolve the issue.  Implemented own solution of striping out special characters as short term solution.