Source

Storing and Loading Data with JSON

August 07, 2013

We’ve already learned about pickle, so why do we need another way to (de)serialize Python objects to(from) disk or a network connection? There are three major reasons to prefer JSON over pickle:

When you’re unpickling data, you’re essentially allowing your data source to execute arbitrary Python commands. If the data is trustworthy (say stored in a sufficiently protected directory), that may not be a problem, but it’s often really easy to accidentally leave a file unprotected (or read something from network). In these cases, you want to load data, and not execute potentially malicious Python code!
Pickled data is not easy to read, and virtually impossible to write for humans. For example, the pickled version of {"answer": [42]} looks like this:
```
(dp0
S'answer'
p1
(lp2
I42
as.
```
In contrast, the JSON representation of {"answer": [42]} is {"answer": [42]}. If you can read Python, you can read JSON; since all JSON is valid Python code!
Pickle is Python-specific. In fact, by default, the bytes generated by Python 3’s pickle cannot be read by a Python 2.x application! JSON can be read by virtually any programming language - just scroll down on the official homepage to see implementations in all major and some minor languages.

So how do you get the JSON representation of an object? It’s simple, just call json.dumps:

import json
obj = {u"answer": [42.2], u"abs": 42}
print(json.dumps(obj))
# output:  {"answer": [42.2], "abs": 42}

Often, you want to write to a file or network stream. In both Python 2.x and 3.x you can call dump to do that, but in 3.x the output must be a character stream, whereas 2.x expects a byte stream.

Let’s look how to load what we wrote. Fittingly, the function to load is called loads (to load from a string) / load (to load from a stream):

import json
obj_json = u'{"answer": [42.2], "abs": 42}'
obj = json.loads(obj_json)
print(repr(obj))

When the objects we load and store grow larger, we puny humans often need some hints on where a new sub-object starts. To get these, simply pass an indent size, like this:

import json
obj = {u"answer": [42.2], u"abs": 42}
print(json.dumps(obj, indent=4))

Now, the output will be a beautiful

{
    "abs": 42, 
    "answer": [
        42.2
    ]
}

I often use this indentation feature to debug complex data structures.

The price of JSON’s interoperability is that we cannot store arbitrary Python objects. In fact, JSON can only store the following objects:

character strings
numbers
booleans (True/False)
None
lists
dictionaries with character string keys

Every object that’s not one of these must be converted - that includes every object of a custom class. Say we have an object alice as follows:

class User(object):
    def __init__(self, name, password):
        self.name = name
        self.password = password
alice = User('Alice A. Adams', 'secret')

then converting this object to JSON will fail:

>>> import json
>>> json.dumps(alice)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.3/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <__main__.User object at 0x7f2eccc88150> is not JSON serializable

Fortunately, there is a simple hook for conversion: Simply define a default method:

def jdefault(o):
    return o.__dict__
print(json.dumps(alice, default=jdefault))
# outputs: {"password": "secret", "name": "Alice A. Adams"}

o.__dict__ is a simple catch-all for user-defined objects, but we can also add support for other objects. For example, let’s add support for sets by treating them like lists:

def jdefault(o):
    if isinstance(o, set):
        return list(o)
    return o.__dict__

pets = set([u'Tiger', u'Panther', u'Toad'])
print(json.dumps(pets, default=jdefault))
# outputs: ["Tiger", "Panther", "Toad"]

For more options and details (ensure_ascii and sort_keys may be interesting options to set), have a look at the official documentation for JSON. JSON is available by default in Python 2.6 and newer, before that you can use simplejson as a fallback.

python

✍️ Comments

Roland
Saturday, Aug 10, 2013 at 22:37 UTC

A reblogué ceci sur Quantum Post.

Milos
Saturday, Jan 11, 2014 at 09:38 UTC

Nicely explained. I am using with open(‘database.txt’,‘a+') as myfile: json.dump(lst,myfile) to write a list of integers called ist in txt file. And it is in a loop, so it adds every time another list in the file. I would like to make it to write every new list to a new line in file, but don’t know how to do that. Or insert ‘,’ between everyblist in a file, so that when I open the file for read, have readable list of lists for manipulation. How can I do that?

Philipp
In reply to Milos
Saturday, Jan 11, 2014 at 11:58 UTC

By default, json.dump will output just a single line, so you’re already good to go. If you want just one large list, simply read in the file with json.load, overwrite it (with myfile.truncate()), and write your new list out. If you’re worried about data consistency, create a temporary file in the same directory, write into that, and then rename it to ‘database.txt’.

kay lee (@badalate)
Saturday, Nov 8, 2014 at 00:22 UTC

This was very very helpful! Thank you!

yw652
Tuesday, Mar 24, 2015 at 16:03 UTC

Reblogged this on Yitong Wang and commented: Still trying to figure out this json thing…

Christian Moosmeier
Thursday, May 19, 2016 at 06:36 UTC

Hi, since a long time I’m searching for the sample code to store class object (like addresses) to file with json. I want to test your explanation but it does not run on my side! Can you please post the complete example code here ( I want to copy & past it for my understanding).

Thx,

phihag
In reply to Christian Moosmeier
Thursday, May 19, 2016 at 06:48 UTC

if your object o is an object of a built-in class like str or list (say an email address “me@example.org”), import json;json.dumps(o) is literally everything you need. It sounds like your question would be better suited to stackoverflow. Feel free to drop me (phihag@phihag.de) a note about your stackoverflow question if it doesn’t get answered. Make sure to include all relevant details, including the definition of what you understand as a “class object” (that would be an object representing a class, which I don’t think you want) and “addresses” (postal? email? memory? inodes? IP?).

Thank you!

Your comment has been submitted and will be published once it has been approved. 😊

Yasoob Khalid

My Books

Storing and Loading Data with JSON

You might also like

✍️ Comments

Say something

Thank you!

Storing and Loading Data with JSON

You might also like

You might also like

Newsletter

✍️ Comments

Say something

Thank you!