Solved

json String Issue

Posted on 2016-11-16
3
31 Views
Last Modified: 2016-11-23
I have a string that is finding random quotes and strange characters that are breaking it.

Example in image, it's some dot in the name. Any php or json command so these won't affect it?  Something to cleanse or slash it out?
Untitled-1.png
0
Comment
Question by:N R
  • 2
3 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 41890420
JSON must be UTF-8.  See if these articles can help you figure it out.  Also, please post the JSON string here, including the bad character.  I'll decode it for you, and we can see what can be done about it.
https://www.experts-exchange.com/articles/11880/Unicode-and-Character-Collisions.html
https://www.experts-exchange.com/articles/22519/Understanding-JSON-in-PHP-and-JavaScript-Applications.html
0
 
LVL 11

Author Comment

by:N R
ID: 41890560
Huh, looks like in the database it is:

MaeL"gorzata

So quotes are messing it up?
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 41890592
Please clarify... There is no space or other character between Mae and L, right?

The quote mark might be the issue, but we need to see this in a little more detail.  The embedded quote mark should be escaped, according to the JSON standard.  The problem with getting "more detail" is that when you copy and paste, it comes with some assumptions - the article about character encoding explains what is happening.  What you see in the browser is encoded according to the character set of the browser.  What you see in the database is encoded according to the character set of the database.  And any data you create in PHP is encoded according to the character set in effect at the time the data was created.  To make matters worse, some text editors will coerce the data into their own encoding scheme.  If any of these character sets are mismatched (eg: database has UTF-8 characters, but the browser is using ISO-8859-1) the outcomes are unpredictable.

This means that you cannot depend on what-you-see-is-what-you-get if you're looking at a browser display of the data, or if you're looking at the text once you have copied it into a text editor.

TL;DR All of your character encoding schemes must be consistent from beginning to end.  And if you're using JSON anywhere along the way, that means all of your character encoding schemes must be UTF-8.

If you want to pursue this further, please show us a link that will give us a way to read the information directly into a program, without going through a copy / paste or browser display.  You might be able to dump that row from the database into data with var_export().  Or you might be able to copy it into a flat file, so you can post a link here.  If we can get the information, we can show you how to break it apart into its byte-by-byte representations, and into its character-by-character representations.  Once upon a time, a byte == a character, but the world has changed and this is not true any more.

This is the sort of script I would use to examine the character encoding and the hex byte values.
<?php // demo/hexdump_unicode_v.php
/**
 * Expand and display a string variable in hexadecimal notation
 * Note: Output will make more sense with a unispace font!
 * http://php.net/manual/en/function.mb-split.php#99851
 *
 * http://iconoun.com/demo/hexdump_unicode_v.php?q=Data:%E7%81%AB%E8%BD%A6%E7%A5%A8!
 *
 * Useful: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1536&number=1024&utf8=0x&unicodeinhtml=hex
 * Refer2: https://www.experts-exchange.com/articles/11880/Unicode-and-Character-Collisions.html
 *
 * @param string $str The variable to expand and display
 * @return none (direct browser output)
 */
error_reporting(E_ALL);

// SET UP PHP TO USE UTF-8
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');


Class Letter
{
    public function __construct($chr)
    {
        $this->chr = $chr;
        $this->hex = array();
        $bytes     = $this->usplit($chr);
        foreach ($bytes as $byte)
        {
            $this->hex = array_merge($this->hex, $this->gethex($byte));
        }
        return $this;
    }

    public function usplit ($chr)
    {
        $len = strlen($chr);
        while ($len) {
            $arr[] = substr($chr, 0, 1);
            $chr   = substr($chr, 1, $len);
            $len   = strlen($chr);
        }
        return $arr;
    }

    public function gethex($chr)
    {
        // GET THE HEX NIBBLE VALUES IN AN ARRAY
        $ret = str_split(implode(NULL, unpack('H*', $chr)));
        return $ret;
    }
}


Class Hexdump
{
    public function __construct($str)
    {
        $this->str = $str;
        $this->arr = $this->mb_str_split($str);
        $this->len = mb_strlen($str);
        foreach ($this->arr as $uchr)
        {
            $this->dat[] = new Letter($uchr);
        }
        return $this;
    }

    public function mb_str_split($ustr)
    {
        return preg_split('/(?<!^)(?!$)/u', $ustr);
    }

    public function render($br = PHP_EOL)
    {
        echo $br . " Pos   Chr \tHex";

        foreach ($this->dat as $poz => $chr)
        {
            echo $br;
            echo str_pad($poz, 4, ' ', STR_PAD_LEFT);
            echo '    ';
            echo $chr->chr;
            echo " \t";
            echo implode(null, $chr->hex);
        }
        echo $br;
    }
}


// DEMONSTRATE IT WITH THE REQUEST ARGUMENT
echo '<meta charset="utf-8" />';
echo '<pre>';

$q = !empty($_GET['q']) ? $_GET['q'] : 'Vöila';
var_dump($q);

$y = new Hexdump($q);
$y->render();

Open in new window

1

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question