• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 82
  • Last Modified:

json String Issue

I have a string that is finding random quotes and strange characters that are breaking it.

Example in image, it's some dot in the name. Any php or json command so these won't affect it?  Something to cleanse or slash it out?
Untitled-1.png
0
Nathan Riley
Asked:
Nathan Riley
  • 2
1 Solution
 
Ray PaseurCommented:
JSON must be UTF-8.  See if these articles can help you figure it out.  Also, please post the JSON string here, including the bad character.  I'll decode it for you, and we can see what can be done about it.
https://www.experts-exchange.com/articles/11880/Unicode-and-Character-Collisions.html
https://www.experts-exchange.com/articles/22519/Understanding-JSON-in-PHP-and-JavaScript-Applications.html
0
 
Nathan RileyFounder/CTOAuthor Commented:
Huh, looks like in the database it is:

MaeL"gorzata

So quotes are messing it up?
0
 
Ray PaseurCommented:
Please clarify... There is no space or other character between Mae and L, right?

The quote mark might be the issue, but we need to see this in a little more detail.  The embedded quote mark should be escaped, according to the JSON standard.  The problem with getting "more detail" is that when you copy and paste, it comes with some assumptions - the article about character encoding explains what is happening.  What you see in the browser is encoded according to the character set of the browser.  What you see in the database is encoded according to the character set of the database.  And any data you create in PHP is encoded according to the character set in effect at the time the data was created.  To make matters worse, some text editors will coerce the data into their own encoding scheme.  If any of these character sets are mismatched (eg: database has UTF-8 characters, but the browser is using ISO-8859-1) the outcomes are unpredictable.

This means that you cannot depend on what-you-see-is-what-you-get if you're looking at a browser display of the data, or if you're looking at the text once you have copied it into a text editor.

TL;DR All of your character encoding schemes must be consistent from beginning to end.  And if you're using JSON anywhere along the way, that means all of your character encoding schemes must be UTF-8.

If you want to pursue this further, please show us a link that will give us a way to read the information directly into a program, without going through a copy / paste or browser display.  You might be able to dump that row from the database into data with var_export().  Or you might be able to copy it into a flat file, so you can post a link here.  If we can get the information, we can show you how to break it apart into its byte-by-byte representations, and into its character-by-character representations.  Once upon a time, a byte == a character, but the world has changed and this is not true any more.

This is the sort of script I would use to examine the character encoding and the hex byte values.
<?php // demo/hexdump_unicode_v.php
/**
 * Expand and display a string variable in hexadecimal notation
 * Note: Output will make more sense with a unispace font!
 * http://php.net/manual/en/function.mb-split.php#99851
 *
 * http://iconoun.com/demo/hexdump_unicode_v.php?q=Data:%E7%81%AB%E8%BD%A6%E7%A5%A8!
 *
 * Useful: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1536&number=1024&utf8=0x&unicodeinhtml=hex
 * Refer2: https://www.experts-exchange.com/articles/11880/Unicode-and-Character-Collisions.html
 *
 * @param string $str The variable to expand and display
 * @return none (direct browser output)
 */
error_reporting(E_ALL);

// SET UP PHP TO USE UTF-8
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');


Class Letter
{
    public function __construct($chr)
    {
        $this->chr = $chr;
        $this->hex = array();
        $bytes     = $this->usplit($chr);
        foreach ($bytes as $byte)
        {
            $this->hex = array_merge($this->hex, $this->gethex($byte));
        }
        return $this;
    }

    public function usplit ($chr)
    {
        $len = strlen($chr);
        while ($len) {
            $arr[] = substr($chr, 0, 1);
            $chr   = substr($chr, 1, $len);
            $len   = strlen($chr);
        }
        return $arr;
    }

    public function gethex($chr)
    {
        // GET THE HEX NIBBLE VALUES IN AN ARRAY
        $ret = str_split(implode(NULL, unpack('H*', $chr)));
        return $ret;
    }
}


Class Hexdump
{
    public function __construct($str)
    {
        $this->str = $str;
        $this->arr = $this->mb_str_split($str);
        $this->len = mb_strlen($str);
        foreach ($this->arr as $uchr)
        {
            $this->dat[] = new Letter($uchr);
        }
        return $this;
    }

    public function mb_str_split($ustr)
    {
        return preg_split('/(?<!^)(?!$)/u', $ustr);
    }

    public function render($br = PHP_EOL)
    {
        echo $br . " Pos   Chr \tHex";

        foreach ($this->dat as $poz => $chr)
        {
            echo $br;
            echo str_pad($poz, 4, ' ', STR_PAD_LEFT);
            echo '    ';
            echo $chr->chr;
            echo " \t";
            echo implode(null, $chr->hex);
        }
        echo $br;
    }
}


// DEMONSTRATE IT WITH THE REQUEST ARGUMENT
echo '<meta charset="utf-8" />';
echo '<pre>';

$q = !empty($_GET['q']) ? $_GET['q'] : 'Vöila';
var_dump($q);

$y = new Hexdump($q);
$y->render();

Open in new window

1

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now