We help IT Professionals succeed at work.
Troubleshooting Question

How can I parse this file with python?

Steve Jennings
on
93 Views
Last Modified: 2020-08-04
I have been fighting this all day long, and I am stuck trying to read this simple file that looks like a dictionary. And I am sure the answer is simple:

I am getting a file returned from a "requests.get(url)" that looks like this:

{"unmapped_path": "10.114.193.250/Port-channel0/ifHCInOctets_ifHCInOctets", "metadata": {"systemlocation": "Singapore", "interfacealias": "", "interfacespeed": "4294967295", "inbound": 5242880, "metric": "InBound Traffic", "vendor": "cisco", "device": "10.114.193.250", "interface": "Port-channel0", "capacity": 20000000000, "inbound_percent": 110}
{"unmapped_path": "10.114.193.251/Port-channel1/ifHCInOctets_ifHCInOctets", "metadata": {"systemlocation": "Singapore", "interfacealias": "", "interfacespeed": "4294967295", "inbound": 5242880, "metric": "InBound Traffic", "vendor": "cisco", "device": "10.114.193.251", "interface": "Port-channel1", "capacity": 20000000000, "inbound_percent": 110}
{"unmapped_path": "10.114.193.252/Port-channel2/ifHCInOctets_ifHCInOctets", "metadata": {"systemlocation": "Singapore", "interfacealias": "", "interfacespeed": "4294967295", "inbound": 5242880, "metric": "InBound Traffic", "vendor": "cisco", "device": "10.114.193.252", "interface": "Port-channel2", "capacity": 20000000000, "inbound_percent": 110}

The sender wont change the format, it isn't JSON, but it looks like a dictionary.
But when I read it into a dictionary, I cant access the elements in the dictionary.

temp = {}
r = requests.get(url3)
temp = (r.text)
for rec in temp:
    print(temp[rec])
I get a "TypeError: string indices must be integers"

Any help would be appreciated.

Thanks.
Steve






Comment
Watch Question

CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Can you provide the complete python example?... the imports are missing.
Using a file: url? or the source url that can be used....

aikimarkSocial distance; Wear a mask; Don't touch your face; Wash your hands for 20 seconds
CERTIFIED EXPERT
Top Expert 2014
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
CERTIFIED EXPERT

Commented:
I agree with aikimark. The Python data structure representation and JSON format are actually fairly close together. His regular expression just adds the missing closing curly braces to the lines, adds the comma separator to the rows as items, and adds square brackets around to form a list of the structures represented by the lines.
Steve JenningsSr Manager Cloud Networking Ops
CERTIFIED EXPERT

Author

Commented:
There seems to have been a change in the format that I am receiving since Friday. Pepr, I get this error:

    File "get-HiCap-data-v3.py", line 34, in <module>
    d = eval(line + '}')
  File "<string>", line 2
    }
    ^
SyntaxError: unexpected EOF while parsing

And aikimark . . . I am trying your code now.

This is what I am receiving if I write it to a file:

{"unmapped_path": "10.105.10.64/Ethernet161/1/12/ifHCInOctets_ifHCInOctets", "metadata": {"systemlocation": "snmplocation", "interfacealias": "aczfs031b-net1", "interfacespeed": "4294967295", "inbound": 9663676416, "interfacedescription": "Ethernet161/1/12", "metric": "InBound Traffic", "hostname": "dist-sw02a", "value": 9663676416, "vendor": "cisco", "interface": "Ethernet161/1/12", "timezone": "", "device": "10.105.10.64", "capacity": 10000000000, "inbound_percent": 97}, "data": [[1596456955, null], [1596457020, 7001078798], [1596457080, 5054347871], [1596457140, 6970346828], [1596457200, 8347806307], [1596457260, 7076578300], [1596457320, 4875459756], [1596457440, 6923953745], [1596457560, 3409092847], [1596457620, 6943747324], [1596457680, 6974957236], [1596457740, 11287738184], [1596457860, 3464244663], [1596457920, 12467410820], [1596458040, 5845677000], [1596458100, 5412771022], [1596458160, 7043456500], [1596458220, 5498123426], [1596458280, 5321339317], [1596458340, 7012415954], [1596458400, 4719530574], [1596458460, 6975671393], [1596458520, 5452257668], [1596458580, 8672553170], [1596458640, 4528727835], [1596458700, 6742393380], [1596458760, 5400859943], [1596458820, 6992241053], [1596458880, 6857915825], [1596458940, 4696731832], [1596459000, 6809773098], [1596459060, 9145766966], [1596459120, 7059487685], [1596459240, 6479606739], [1596459300, 5209883281], [1596459360, 6193712014], [1596459420, 5269491362], [1596459480, 7524409384], [1596459540, 10231040685], [1596459600, 8019016639], [1596459660, 10392080138], [1596459720, 10307152968], [1596459840, 8675656868], [1596459900, 7989568524], [1596459960, 7955499685], [1596460020, 10328306586], [1596460080, 7265077216], [1596460140, 10450654060], [1596460200, 10355050145], [1596460260, 8018259353], [1596460320, 9878570136], [1596460557, null]], "name": "10.105.10.64/Ethernet161/1/12/InBound Traffic", "label": "bps"}
{"unmapped_path": "10.106.10.66/Ethernet1/44/ifHCInOctets_ifHCInOctets", "metadata": {"systemlocation": "snmplocation", "interfacealias": "TRUNK to core-sw2 eth1/20", "interfacespeed": "4294967295", "inbound": 8589934592, "interfacedescription": "Ethernet1/44", "metric": "InBound Traffic", "hostname": "dist-sw01b", "value": 8589934592, "vendor": "cisco", "device": "10.106.10.66", "timezone": "", "interface": "Ethernet1/44", "capacity": 10000000000, "inbound_percent": 92}, "data": [[1596456955, null], [1596457020, 7906688701], [1596457080, 10453209698], [1596457140, 5788106532], [1596457200, 7303121727], [1596457260, 5482912162], [1596457320, 2546381550], [1596457380, 2134702837], [1596457440, 2972452229], [1596457500, 2597055214], [1596457560, 2393311935], [1596457620, 3195650670], [1596457680, 3856349909], [1596457740, 2328834192], [1596457800, 1993907831], [1596457860, 2237315030], [1596457920, 3051397872], [1596458040, 2397011713], [1596458100, 3614543500], [1596458160, 2302210222], [1596458220, 2365656253], [1596458280, 5620162834], [1596458340, 3307931372], [1596458400, 4133159102], [1596458460, 2588138965], [1596458520, 2572050168], [1596458580, 3274356206], [1596458640, 2427299362], [1596458700, 7671976395], [1596458760, 7359520977], [1596458880, 4678998710], [1596458940, 3062521617], [1596459000, 2928877598], [1596459060, 2293510112], [1596459120, 2343755276], [1596459180, 1986449454], [1596459240, 3212935152], [1596459300, 2113009038], [1596459360, 2526363777], [1596459420, 8462112291], [1596459480, 9171041752], [1596459540, 6997549558], [1596459600, 6881218282], [1596459660, 6196797625], [1596459720, 6726023608], [1596459780, 5590395217], [1596459840, 8048504533], [1596459900, 5990535361], [1596459960, 8358542330], [1596460020, 7053954382], [1596460080, 9488643843], [1596460140, 7377026233], [1596460200, 6936087963], [1596460260, 7694498401], [1596460320, 9273909895], [1596460380, 6022752117], [1596460440, 6921795240], [1596460500, 9396898346], [1596460557, null]], "name": "10.106.10.66/Ethernet1/44/InBound Traffic", "label": "bps"}
{"unmapped_path": "10.87.193.19/Ethernet1/3/ifHCInOctets_ifHCInOctets", "metadata": {"systemlocation": "snmplocation", "interfacealias": "aazfs002a:eth2", "interfacespeed": "4294967295", "inbound": 8589934592, "interfacedescription": "Ethernet1/3", "metric": "InBound Traffic", "hostname": "stor-sw116a", "value": 8589934592, "vendor": "cisco", "device": "10.87.193.19", "timezone": "", "interface": "Ethernet1/3", "capacity": 10000000000, "inbound_percent": 94}, "data": [[1596456955, null], [1596457020, 8789776771], [1596457080, 9150822470], [1596457140, 9472269563], [1596457200, 9636267985], [1596457260, 9201136349], [1596457320, 9250378847], [1596457380, 9086751815], [1596457440, 9149845065], [1596457500, 8511353945], [1596457560, 8678892110], [1596457620, 8733865306], [1596457680, 10891687402], [1596457740, 9363020515], [1596457800, 8579050235], [1596457860, 8786686872], [1596457920, 8661790517], [1596457980, 9536981101], [1596458100, 8252261260], [1596458160, 10650757570], [1596458220, 7953043225], [1596458280, 10226473386], [1596458340, 8753754414], [1596458400, 9138655305], [1596458460, 9206519650], [1596458520, 8919798124], [1596458580, 8809992579], [1596458640, 9329646766], [1596458700, 10320919486], [1596458760, 8900793185], [1596458820, 9560706790], [1596458880, 8439200669], [1596458940, 7892123033], [1596459000, 9267826228], [1596459060, 9385872718], [1596459120, 9277385505], [1596459180, 8898641469], [1596459240, 9096790058], [1596459300, 10409763928], [1596459360, 8541465862], [1596459420, 8605433445], [1596459480, 8633420030], [1596459540, 8573940594], [1596459600, 9467618541], [1596459660, 9408529070], [1596459720, 9184827575], [1596459780, 7142228763], [1596459840, 9687795534], [1596459900, 9311195640], [1596459960, 9150388973], [1596460020, 8813940679], [1596460080, 8924906501], [1596460140, 8988858629], [1596460200, 8971898239], [1596460260, 9100562138], [1596460320, 10346417659], [1596460380, 6665454335], [1596460440, 9050798217], [1596460500, 9455588437], [1596460557, null]], "name": "10.87.193.19/Ethernet1/3/InBound Traffic", "label": "bps"}

CERTIFIED EXPERT

Commented:
I will look at it later, but it seems they have fixed the missing closing curly brace. They also added the data as the list of lists. This way, I suggest to follow the approach by aikimark -- that is to read all the rows to the list of lines, join the lines using comma separator, and wrap it by [ and ]. Then the string should be parsed by JSON.
CERTIFIED EXPERT

Commented:
The JSON solution for the new data could be:
import json

fname = 'data2.txt'

# Extract only non-empty lines.
lines = []
with open(fname) as f:
   for line in f:
       line = line.rstrip()
       if line:
           lines.append(line)

# Join the lines using comma and wrap in square brackets to express "the list of rows"
s = '[' + ','.join(lines) + ']'

# Parse the string as JSON content.
rows = json.loads(s)

# Process each row.
for row in rows:
    ##print(row)
    print('-' * 70)
    for k, v in row.items():
        if isinstance(v, dict):
            print(f'{k}:')
            for kk, vv in v.items():
                print(f'\t{kk}:\t{vv}')
        elif isinstance(v, list):
            print(f'{k}:')
            for elem in v:
                print(f'\t{elem}')
        else:
            print(f'{k}:\t{v}')

Open in new window

It produces output like
----------------------------------------------------------------------
unmapped_path:  10.105.10.64/Ethernet161/1/12/ifHCInOctets_ifHCInOctets
metadata:
        systemlocation: snmplocation
        interfacealias: aczfs031b-net1
        interfacespeed: 4294967295
        inbound:        9663676416
        interfacedescription:   Ethernet161/1/12
        metric: InBound Traffic
        hostname:       dist-sw02a
        value:  9663676416
        vendor: cisco
        interface:      Ethernet161/1/12
        timezone:
        device: 10.105.10.64
        capacity:       10000000000
        inbound_percent:        97
data:
        [1596456955, None]
        [1596457020, 7001078798]
...snip...
        [1596460320, 9878570136]
        [1596460557, None]
name:   10.105.10.64/Ethernet161/1/12/InBound Traffic
label:  bps
----------------------------------------------------------------------
unmapped_path:  10.106.10.66/Ethernet1/44/ifHCInOctets_ifHCInOctets
metadata:
        systemlocation: snmplocation
        interfacealias: TRUNK to core-sw2 eth1/20
        interfacespeed: 4294967295
        inbound:        8589934592
        interfacedescription:   Ethernet1/44
        metric: InBound Traffic
        hostname:       dist-sw01b
        value:  8589934592
        vendor: cisco
        device: 10.106.10.66
        timezone:
        interface:      Ethernet1/44
        capacity:       10000000000
        inbound_percent:        92
data:
        [1596456955, None]
        [1596457020, 7906688701]
        [1596457080, 10453209698]
...snip...
        [1596460500, 9396898346]
        [1596460557, None]
name:   10.106.10.66/Ethernet1/44/InBound Traffic
label:  bps
----------------------------------------------------------------------
unmapped_path:  10.87.193.19/Ethernet1/3/ifHCInOctets_ifHCInOctets
metadata:
        systemlocation: snmplocation
        interfacealias: aazfs002a:eth2
        interfacespeed: 4294967295
        inbound:        8589934592
        interfacedescription:   Ethernet1/3
        metric: InBound Traffic
        hostname:       stor-sw116a
        value:  8589934592
        vendor: cisco
        device: 10.87.193.19
        timezone:
        interface:      Ethernet1/3
        capacity:       10000000000
        inbound_percent:        94
data:
        [1596456955, None]
        [1596457020, 8789776771]
...snip...
        [1596460500, 9455588437]
        [1596460557, None]
name:   10.87.193.19/Ethernet1/3/InBound Traffic
label:  bps

Open in new window


The eval solution requires eval(), and that is always considered a kind of more dangerous if you are not sure that the data comes from reliable source. It also requires to replace the text 'null' to 'None'.

I prefer the JSON solution. Otherwise, the solution would be similar.
with open(fname) as f:
   for line in f:
       line = line.replace('null', 'None')
       d = eval(line)
       print('-' * 70)
       for k, v in d.items():
           if isinstance(v, dict):
               print(f'{k}:')
               for kk, vv in v.items():
                   print(f'\t{kk}:\t{vv}')
           elif isinstance(v, list):
               print(f'{k}:')
               for elem in v:
                   print(f'\t{elem}')
           else:
               print(f'{k}:\t{v}')

Open in new window

nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
The uncontrained replacement from null into None might be harmfull is null is part of a name elsewhere.
It would work in the provided example data.
Steve JenningsSr Manager Cloud Networking Ops
CERTIFIED EXPERT

Author

Commented:
Thanks everyone for the comments. The supplier once again changed the format so that none of the code you've provided now works. I managed to hack it to get what I need, but now I have reverted to simply scanning a line for the presence of "unmapped_path" and "ifHCInOctets" or "ifHCOutOctets". If those strings appear on a line it provides me with the basic data I need to determine that an interface is sustaining high bandwidth use.

I've learned a lot from each of you. Thanks for your help.

Steve
aikimarkSocial distance; Wear a mask; Don't touch your face; Wash your hands for 20 seconds
CERTIFIED EXPERT
Top Expert 2014

Commented:
Here is the regex pattern to use and sample code for that.
import re
rgx = re.compile(r"unmapped_path|ifHCInOctets|ifHCOutOctets")

#with each line read
if rgx.search(linevariable):
    #take action

Open in new window


Glad we could help.
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.