Solved

json String Issue

Posted on 2016-11-16
3
45 Views
Last Modified: 2016-11-23
I have a string that is finding random quotes and strange characters that are breaking it.

Example in image, it's some dot in the name. Any php or json command so these won't affect it?  Something to cleanse or slash it out?
Untitled-1.png
0
Comment
Question by:N R
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 41890420
JSON must be UTF-8.  See if these articles can help you figure it out.  Also, please post the JSON string here, including the bad character.  I'll decode it for you, and we can see what can be done about it.
https://www.experts-exchange.com/articles/11880/Unicode-and-Character-Collisions.html
https://www.experts-exchange.com/articles/22519/Understanding-JSON-in-PHP-and-JavaScript-Applications.html
0
 
LVL 11

Author Comment

by:N R
ID: 41890560
Huh, looks like in the database it is:

MaeL"gorzata

So quotes are messing it up?
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 41890592
Please clarify... There is no space or other character between Mae and L, right?

The quote mark might be the issue, but we need to see this in a little more detail.  The embedded quote mark should be escaped, according to the JSON standard.  The problem with getting "more detail" is that when you copy and paste, it comes with some assumptions - the article about character encoding explains what is happening.  What you see in the browser is encoded according to the character set of the browser.  What you see in the database is encoded according to the character set of the database.  And any data you create in PHP is encoded according to the character set in effect at the time the data was created.  To make matters worse, some text editors will coerce the data into their own encoding scheme.  If any of these character sets are mismatched (eg: database has UTF-8 characters, but the browser is using ISO-8859-1) the outcomes are unpredictable.

This means that you cannot depend on what-you-see-is-what-you-get if you're looking at a browser display of the data, or if you're looking at the text once you have copied it into a text editor.

TL;DR All of your character encoding schemes must be consistent from beginning to end.  And if you're using JSON anywhere along the way, that means all of your character encoding schemes must be UTF-8.

If you want to pursue this further, please show us a link that will give us a way to read the information directly into a program, without going through a copy / paste or browser display.  You might be able to dump that row from the database into data with var_export().  Or you might be able to copy it into a flat file, so you can post a link here.  If we can get the information, we can show you how to break it apart into its byte-by-byte representations, and into its character-by-character representations.  Once upon a time, a byte == a character, but the world has changed and this is not true any more.

This is the sort of script I would use to examine the character encoding and the hex byte values.
<?php // demo/hexdump_unicode_v.php
/**
 * Expand and display a string variable in hexadecimal notation
 * Note: Output will make more sense with a unispace font!
 * http://php.net/manual/en/function.mb-split.php#99851
 *
 * http://iconoun.com/demo/hexdump_unicode_v.php?q=Data:%E7%81%AB%E8%BD%A6%E7%A5%A8!
 *
 * Useful: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1536&number=1024&utf8=0x&unicodeinhtml=hex
 * Refer2: https://www.experts-exchange.com/articles/11880/Unicode-and-Character-Collisions.html
 *
 * @param string $str The variable to expand and display
 * @return none (direct browser output)
 */
error_reporting(E_ALL);

// SET UP PHP TO USE UTF-8
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');


Class Letter
{
    public function __construct($chr)
    {
        $this->chr = $chr;
        $this->hex = array();
        $bytes     = $this->usplit($chr);
        foreach ($bytes as $byte)
        {
            $this->hex = array_merge($this->hex, $this->gethex($byte));
        }
        return $this;
    }

    public function usplit ($chr)
    {
        $len = strlen($chr);
        while ($len) {
            $arr[] = substr($chr, 0, 1);
            $chr   = substr($chr, 1, $len);
            $len   = strlen($chr);
        }
        return $arr;
    }

    public function gethex($chr)
    {
        // GET THE HEX NIBBLE VALUES IN AN ARRAY
        $ret = str_split(implode(NULL, unpack('H*', $chr)));
        return $ret;
    }
}


Class Hexdump
{
    public function __construct($str)
    {
        $this->str = $str;
        $this->arr = $this->mb_str_split($str);
        $this->len = mb_strlen($str);
        foreach ($this->arr as $uchr)
        {
            $this->dat[] = new Letter($uchr);
        }
        return $this;
    }

    public function mb_str_split($ustr)
    {
        return preg_split('/(?<!^)(?!$)/u', $ustr);
    }

    public function render($br = PHP_EOL)
    {
        echo $br . " Pos   Chr \tHex";

        foreach ($this->dat as $poz => $chr)
        {
            echo $br;
            echo str_pad($poz, 4, ' ', STR_PAD_LEFT);
            echo '    ';
            echo $chr->chr;
            echo " \t";
            echo implode(null, $chr->hex);
        }
        echo $br;
    }
}


// DEMONSTRATE IT WITH THE REQUEST ARGUMENT
echo '<meta charset="utf-8" />';
echo '<pre>';

$q = !empty($_GET['q']) ? $_GET['q'] : 'Vöila';
var_dump($q);

$y = new Hexdump($q);
$y->render();

Open in new window

1

Featured Post

Is Your DevOps Pipeline Leaking?

Is your CI/CD pipeline a hodge-podge of randomly connected tools? You’ve likely got a tool to fix one problem & then a different tool to fix another, resulting in a cluster of tools with overlapping functionality. Learn how to optimize your pipeline with Gartner's recommendations

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question