Link to home
Start Free TrialLog in
Avatar of Steve Jennings
Steve Jennings

asked on

Need Help Parsing "Improperly formed(?)" JSON with Python

I am reading a JSON file that is passed to me from an upstream provider who has ignored my requests for help parsing the file they send. It looks (example) like this:

[
    {
        "Risk Level": "Low",
        "CreatedTime": "2019-04-02T10:39:57",
        "Product": "Hosting",
        "Communication Plan": "send email",
        "CI Cost Center": "1111"
    },
    {
        "Risk Level": "Low",
        "CreatedTime": "2019-04-18T07:31:13",
        "Product": "Hosting",
        "Communication Plan": "email",
        "Related Customer Impact": "None Expected",
        "Requester": "peter@hhhhhhh.com"
    },
    {
        "CreatedTime": "2019-08-17T04:13:10",
        "Risk Level": "Minimal",
        "Business Service Destination": "fsdfdsfdsfsfddsf",
        "DB Server Name Destination": "sss",
        "Product": "Hosting",
        "Related Customer Impact": ""
    }
]

Open in new window

First, the key:value are not always in order. And, second, the key:value pairs don't always appear for each "record".

Even if the "answer" is "that's not valid JSON" I still have to parse through this and get data for some of the keys.

So if I want "Risk Level", "CreatedTime", and "Related Customer Impact" what's the way to do this? I have tried "IF" statements, but I encounter "KeyError" because I don't know the value. So if I try something like this, I get a "KeyError"

if obj[counter]["Risk Level"]:
    var1 = obj[counter]["Risk Level"]

Open in new window


Any thoughts?

Thanks,
Steve
Avatar of ste5an
ste5an
Flag of Germany image

Order of key-value pairs does not matter.

As JSON is an object serilization, missing pairs indicate NULL properties and must be handled during deserilization of your objects.
Avatar of Steve Jennings
Steve Jennings

ASKER

Fair enough.

I contacted the file supplier and they've acknowledged that the "out of order" condition is now "fixed" and I have a file that shows that. However, I don't know how to handle the NULL values.

key1:value-a
key2:value-b
key3:value-c
key4:value-d

key1:value-aa
key3:value-bb
key4:value-cc

Assume that's the file. I need to open it, read it and produce a csv that would look like:

value-a,value-b,value-c,value-d
value-aa,,value-bb,value-cc

. . . and I can't seem to figure out how to do that.

Can you help?

Thanks,
Steve
Well, your class should do it during initialization while the JSON is deserilized to it.
ste5an . . . I know you are trying to help, but I'm not able to turn your answer into something I can actually work with.

Thanks,
Steve
JavaScript Onject Notation (JSON) is a serialization format for JS objects. Thus "JSON" represents an object or an array of objects or an array of values.

In your given case it is an array of objects.

So you need to deserialize it to objects of that class. Thus, do you have that class? Have you you written a custom decoder function for json.loads()`?
Here's what I have so far:

body = {
    "username" : "",      
    "password" : ""
}
headers = {'Content-Type': 'application/json', 'Authorization': 'Basic lotsofcharacters'}    
url = 'http://some-url/requests?status=Completed'
response = requests.get(url, headers=headers,  verify=False)
js = json.loads(response.text)
print(js)

Open in new window




Where is your class for the data?
Not actually sure what you are asking.

That simple python code retrieves the data I am looking for but I need to be able to upload that data into a MySQL table . . . meaning that I have to figure out how to get the values from the key:value pair, ensure that I know which value I have so that the value is inserted into the correct column. The only way I know how to do that (sorta) is iterate through the JSON file and write it out as CSV. I have working code that does that. But the new "vendor" who is supplying the data now, sends "incomplete" values. Like:

key1:value-a
key2:value-b
key3:value-c
key4:value-d

key1:value-aa
key3:value-bb
key4:value-cc

The Python code I hacked together before worked just fine when the data looked like:
key1:value-a
key2:value-b
key3:value-c
key4:value-d

key1:value-aa
key2:NULL or blank or ''
key3:value-bb
key4:value-cc

When I ran the original scripts against the data supplied by the new vendor, I got key errors. And that's when I noticed that some of the blocks had 50 lines, some had 55, lines, etc.

That's where I am struggling. I was simply looking at the length of the array and then iterating through the array and assigning values like:

for i in range(len(hostnames)):
      result = get_devices_attribute(get_devices_attribute_url, hostnames[i], headers, token)
      resultsHost.append(result)
rh=resultsHost
for u in range(len(resultsHost)):
      r1 = rh[u]['name'].strip()
      r2 = rh[u]['mgmtIP'].strip()
      r3 = rh[u]['mgmtIntf'].strip()
      r4 = rh[u]['subTypeName'].strip()
      r5 = rh[u]['vendor'].strip()
      . . .
     r103 = rh[u]['LastDateofSupport'].strip()

Open in new window


This code throws the "KeyError" with the new vendor data.

Thanks,
Steve

ASKER CERTIFIED SOLUTION
Avatar of pepr
pepr

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks pepr . . . I really appreciate the time you took to help. I am a novice and a bit of a hack. And my boss knows that my python "skill" is really a reflection of how proficient I am with Google experts-exchange.

Thanks again!
:) It is not about your boss, it is about you. You know, every track has its start and should have some direction. Python is only one small track like that (for you). Feel free to ask details to understand all the bits of it. ;)
Hey pepr . . . thanks again for your help. I am getting the output I need, very helpful. What's the "f" that precedes {k} designate?

print(f'{k}: {rec.get(k, "N/A")}')

That said, when I try to write the data out to a file or even pipe the output from the screen to a file I get a codec issue. I am guessing that I need to specify utf-8 for the file output? But it's not clear to me why piping the output to a file would return that error.

Traceback (most recent call last):
  File "get-apex-CX-vEE.py", line 135, in <module>
    print(rec)
  File "C:\work\python3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1836-1843: character maps to <undefined>

Open in new window

For the f-string... It is a formatted string literal -- new feature since Python 3.6 (see https://docs.python.org/3/reference/lexical_analysis.html#f-strings). Whatever you close to curly braces inside, it is evaluated as an expression, converted to string, and inserted instead of that placeholder. If you know C#, then it is the equivalent of the C# $"Text {expression} text"

The f-string literal is the enhancement of the earlier string formatting via .format() or .format_map(). Using that it would be written as print('{0}: {1}',format(k, rec.get(k, "N/A"))) . (In C# it would be string.Format("{0}: {1}", k, expr);), In Python, the number in the curly braces can be left out -- if you keep the order.

In both cases, the number (index) in curly braces or the expression in f-string can be followed by colon and some formatting specification. (See https://docs.python.org/3/library/string.html#format-string-syntax ).

And the oldest way of template string literals in Python uses the %. These days, it is good to know that it exists (to be able to read old sources); however, I would recommend to use the newest for the new Python, or the earlier str.format for earlier versions of Python.

For the encoding, you are right. The data contain the character that is outside the 1252 code page. Use the encoding argument in the open function, like that:
with open('output_file.txt', 'w', encoding='utf-8') as f:
    f.write(my_content)

Open in new window