• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 33
  • Last Modified:

Query with Encoded Character Stopped Working in PHP 7 Upgrade

I recently upgraded from php version 5.5 to 7.0

This query worked great in PHP 5.5 but no longer works in 7.0.  Other than the upgrade to php 7.0 and switching to MySQLi

"SELECT * FROM axis WHERE code LIKE '%ƒ%'"

Open in new window

What changed between 5.5 and 7.0 that would cause this query to stop working?
Paul Konstanski
Paul Konstanski
  • 6
  • 5
1 Solution
It's hard to say for sure without seeing the bigger picture. For example, is that character hardcoded into the script, or is it passed in via web page input? What extension are you using to query the database?

For what it's worth, I would strongly suggest that if you want to search for that character and you're using UTF-8 for your table data storage, then just use this query instead:

SELECT * FROM axis WHERE code LIKE CONCAT('%', UNHEX('c692'), '%')

"c692" is the hex code for the ƒ character in UTF-8. It should have the same result without the risk of any unwanted mangling from multibyte character handling.

Finally, I'd suggest comparing the php.ini files from your 5.5 and 7.0 installations. There may be an unexpected charset setting somewehere, but again, it's hard to pinpoint without knowing more.
Paul KonstanskiProject SpecialistAuthor Commented:
Thanks for the reply.

  • The character is hardcoded into the script.
  • I tried your query and it did not work.
  • There is a difference in the iconv php.ini settings.

In the OLD PHP 5.5 php settings where the query worked, this setting was:
iconv.input_encoding      ISO-8859-1
iconv.internal_encoding      ISO-8859-1
iconv.output_encoding      ISO-8859-1

But in the new 7.0 installation all three of these are undefined.

Can I make that switch in the php.ini settings to test if that makes a difference. Or is there the potential that it could mess up the existing data in my database?

Okay, so if my suggested query didn't work, then your database table is likely not be using UTF-8 for encoding. And based off what you've indicated with iconv, then there's a good chance the database is using an extended ASCII charset instead (ISO-8859-1 is one of them).

So try my same query but this time do:
SELECT * FROM axis WHERE code LIKE CONCAT('%', UNHEX('83'), '%')

So 0x83 is the same "function" character in almost every extended ASCII charset.

If that works, then that confirms your database table encoding.

Those 3 ini settings are deprecated as of PHP 5.6, and "default_charset" now sets those values as the recommended approach. The default charset defaults to UTF-8 (since that's just the direction things are moving in), but you can change that in your php.ini to be set back to iso-8859-1:

default_charset = "iso-8859-1"

Or you can just leave default_charset alone and there should be "internal_encoding", "input_encoding", and "output_encoding" INI settings a little below default_charset, and you can uncomment those and set them. To be clear, they are basically the same config var names you had but without the "iconv." at the beginning.
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

Paul KonstanskiProject SpecialistAuthor Commented:
Your second query did work. And I think I know what is causing the problem with how things are written into the database.

I would rather stick with working with defaults rather than changing my php.ini settings.  And this affects very little of my overall script. It would be easier for me to simply find the few places where it has to do either a) this query or b) a comparision. Since you've helped me know how to do the query, here is my follow-up question.

How do I do a comparison?  

In the php 5.4 version this worked:

First I did a query to bring me all of the values of the code field. Then I ran thru an foreach loop doing a comparison to find all the ones where the value of the code was "ƒ". The following is a simplified code.

if (strstr($code, "ƒ") return true;
else return false;

Open in new window

The value that were returned from the query were:
[0] = a
[1] = c
[2] = a
[3] = ƒ
[4] = c
[5] = b
[6] =  ƒ

So for the loop it should return true on key value 3 and 6, but false for all the rest.

So in light of what we have discovered about this database, what is the correct way to do the if comparison since if (strstr($code, "ƒ")  does not work?
Well, if you want to convert over to using UTF-8, then it might be as simple as updating the database table collation (character set).

So just convert the table and column encoding to UTF-8 ("utf8_general_ci", for example). You can use a tool like HeidiSQL to help you do it (which is probably safest if you're not familiar with table alterations), or if you want to try it via a manual query:
ALTER TABLE `axis` COLLATE='utf8_general_ci'
  CHANGE COLUMN `code` `code` ...your column definition here.... COLLATE 'utf8_general_ci';

This SHOULD convert the existing ISO-8859-1 data in the code column to UTF-8, at which point, my very first query that I suggested you run should now work (the c692 one), and your own original query (the one that HAD stopped working) should now work.

But before you do any DB changes, make sure you have a full backup. Never alter a database table without a backup.
Paul KonstanskiProject SpecialistAuthor Commented:
According to the database information that I can see when querying the database, it is set to:

Character Set is: "UTF8"
And the COLLATE is set at: "utf8_general_ci"

So that is what has me confused here. Why is my comparison not working?
The tables and columns can have their own collation - run "SHOW CREATE TABLE axis" and paste the output here.

If the table/column has no separate collation, then what might be happening is that the database might be prepped for UTF-8 but PHP is inserting ISO-8859-1. However, that really SHOULD throw a database error, so I'm guessing you have a column-level collation that is not UTF-8.
Paul KonstanskiProject SpecialistAuthor Commented:
Here is the result of your request:

  `order` int(11) NOT NULL,
  `first` varchar(20) NOT NULL DEFAULT '',
  `last` varchar(30) NOT NULL DEFAULT '',
  `email` varchar(64) NOT NULL DEFAULT '',
  `code` text NOT NULL,
  `timestmp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
  KEY `main` (`order`) USING BTREE

Open in new window

Aha. Okay, so yeah, this:
timestmp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,

Open in new window

...isn't actually valid (0000-00-00 00:00:00 isn't a valid default value for timestamp), so the fact that it's working on your system means that you either have a slightly older version of MySQL installed or you have strict mode turned off on SQL. Otherwise, you would get errors with that setup.

So when strict mode is turned off, the server won't reject invalid data, which means that your database is configured for UTF-8 but the client is inserting non-UTF-8 data and the server is simply allowing it to happen (it would normally reject the value).

So that means your safest bet is to use PHP to fetch the data and convert it to UTF-8 and then update the row with it.

The general flow would look something like this:

$db = new mysqli("database_server", "username", "password", "database_name");
$rs = $db->query("SELECT axID, text FROM axis");
while($row = $rs->fetch_assoc())
  $axID = $row["axID"];
  $text = $row["text"]; // This should be the ISO-8859-1 text value from the DB
  $text = utf8_encode($text); // Convert the ISO-8859-1 text to UTF-8 text
  $db->query("UPDATE axis SET text='" . $db->real_escape_string($text) . "' WHERE axID={$axID}"); // Update the row and push the UTF-8 data to the database

Open in new window

Again, make sure you have a backup before you do this. The above presumes that you do not have a MIX of both UTF-8 and ISO-8859-1. You don't want to run utf8_encode() on content that is already UTF-8 encoded, since you'll get some weird looking characters. Since your previous PHP version was all geared up for ISO-8859-1, this is probably a safe assumption, but it never hurts to have a backup.

Also, it wouldn't hurt to be cautious and temporarily add a tinyint field like "converted" with a default set to 0, and then as run your UPDATE query on each row, also set that row's "converted" field to 1. That way you can know which rows have been processed already, just in case anything goes wrong and you need to restart the process (you can select rows where converted=0). Once you're all finished and everything checks out, just drop that temporary field.
Paul KonstanskiProject SpecialistAuthor Commented:
This is excellent advice. Some of the data in this database is like 8 years old and who knows what was happening back then.

I will spend some time and do some good cleanup.

Thanks again for ALL THE GREAT ADVICE!
Paul KonstanskiProject SpecialistAuthor Commented:
By receiving the sequential help to identify and walk through this issue, I arrived at a good point of giving me a solution to work towards.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now