5

I would like my SpreadSheet class below to be considered a dict subclass by the isinstance() built-in, but when I try to register it as such, an AttributeError exception is thrown (also shown below).

What is a (or the) way to do something like this?

Note: My question is similar to Is it possible to be a virtual subclass of a built in type?, but its accepted answer doesn't address the titular question asked (so please don't vote to close this as a duplicate).

The primary motivation for wanting to do this is to allow an instance of the class to be passed to json.dump() and be treated just like a Python dict. This is needed because — for reasons I don't understand — the JSONEncoder class uses isinstance(value, dict), rather than isinstance(value, Mapping).

from collections.abc import MutableMapping


class SpreadSheet(MutableMapping):
    def __init__(self, tools=None, **kwargs):
        self._cells = {}
        self._tools = {'__builtins__': None}
        if tools is not None:
            self._tools.update(tools)  # Add caller supplied functions.

    def clear(self):
        return self._cells.clear()

    def __contains__(self, k):
        return k in self._cells

    def __setitem__(self, key, formula):
        self._cells[key] = formula

    def __getitem__(self, key ):
        return eval(self._cells[key], self._tools, self)

    def __len__(self):
        return len(self._cells)

    def __iter__(self):
        return iter(self._cells)

    def __delitem__(self, k):
        del self._cells[k]

    def getformula(self, key):
        return self._cells[key]

type(dict).register(SpreadSheet)  # Register class as dict subclass.

ss = SpreadSheet()
print(f'isinstance(ss, dict): {isinstance(ss, dict)}')  # Result should be True.

Exception:

Traceback (most recent call last):
  File "spreadsheet.py", line 35, in <module>
    type(dict).register(SpreadSheet)  # Register class as dict subclass.
AttributeError: type object 'type' has no attribute 'register'

Chosen Solution

As the accepted answer to Is it possible to be a virtual subclass of a built in type? says, it's impossible as primitive types are essentially immutable.

However it is possible to make json.dump() treat a Mapping just like a dict by patching the module as shown in the second approach presented in @jsbueno's answer. The implementation shown below patches the encoder in a slightly different, simpler way that gives equivalent results. I especially liked a bonus feature it has which is that it also prevents the C optimized version from being used (which silently fails).

from collections.abc import Mapping, MutableMapping
from functools import partial
import json


class SpreadSheet(MutableMapping):
    def __init__(self, tools=None, **kwargs):
        self._cells = {}
        self._tools = {'__builtins__': None}  # Prevent eval() from supplying.
        if tools is not None:
            self._tools.update(tools)  # Add any caller-supplied functions.

    def clear(self):
        return self._cells.clear()

    def __contains__(self, key):
        return key in self._cells

    def __setitem__(self, key, formula):
        self._cells[key] = formula

    def __getitem__(self, key):
        return eval(self._cells[key], self._tools, self)

    def __len__(self):
        return len(self._cells)

    def __iter__(self):
        return iter(self._cells)

    def __delitem__(self, key):
        del self._cells[key]

    def getformula(self, key):
        """ Return raw un-evaluated contents of cell. """
        return self._cells[key]

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).iteritems():
            self[k] = v


# Monkey-path json module

# Changes check for isinstance(obj, dict) to isinstance(obj, Mapping)
# https://github.com/python/cpython/blob/3.8/Lib/json/encoder.py#L321
# This changes the default value of the function's dict= keyword to be
# a Mapping instead of a dict. The isinstance() call uses whatever it's
# set to.
_new__make_iterencode = partial(json.encoder._make_iterencode, dict=Mapping)

json.encoder._make_iterencode = _new__make_iterencode
json.encoder.c_make_encoder = None  # Disables use of C version of make encoder


if __name__ == '__main__':

    import json
    from math import cos, sin, pi, tan

    # A small set of safe built-ins.
    tools = dict(len=len, sin=sin, cos=cos, pi=pi, tan=tan)

    ss = SpreadSheet(tools)
    ss['a1'] = '5'
    ss['a2'] = 'a1*6'
    ss['a3'] = 'a2*7'
    ss['b1'] = 'sin(pi/4)'

    print()
    print('isinstance(SpreadSheet(tools), dict) -> {}'.format(isinstance(ss, dict)))
    print()
    print('Static Contents via getformula():')
    print(json.dumps({k: ss.getformula(k) for k in ss.keys()}, indent=4))
    print()
    print('Dynamic Contents via __getitem__():')
    print("  ss['a1'] -> {!r}".format(ss['a1']))
    print("  ss['a2'] -> {!r}".format(ss['a2']))
    print("  ss['a3'] -> {!r}".format(ss['a3']))
    print("  ss['b1'] -> {!r}".format(ss['b1']))
    print()
    print("via json.dumps(ss, indent=4):")
    print(json.dumps(ss, indent=4))
    print()
    print("via json.dumps(ss):")  # Works, too.
    print(json.dumps(ss))  # -> {}
    print()
    print('dict(**ss): {}'.format(dict(**ss)))  # Gets dynamic contents.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 3
    Have you considered subclassing? `class SpreadSheet(MutableMapping, dict):`? – juanpa.arrivillaga Sep 17 '19 at 23:09
  • @juanpa: That works as far as `isinstance` is concerned, my only concern is that it may be pulling in something incompatible or undesired into the implementation. Wow, can't believe I missed something so simple — thanks! – martineau Sep 17 '19 at 23:16
  • Well, I would have the same reservation. But I believe it won't due to inheriting from `MutableMapping` first, but not totally sure. – juanpa.arrivillaga Sep 17 '19 at 23:17
  • @@juanpa: Adding `dict` as another superclass does have undesired side-effects, so doing what you suggested doesn't appear feasible. – martineau Sep 18 '19 at 14:48
  • 2
    You shouldn't try to make your object be considered a `dict` subclass unless it's actually a `dict` subclass. Things would try to *use* `dict` parts that your objects don't have. – user2357112 Sep 18 '19 at 17:18
  • 1
    If you want to control how your objects are JSON-encoded, the way to do that is to pass a custom `default` function to `json.dump`, or to use a custom JSON encoder class with an override for the `default` method. – user2357112 Sep 18 '19 at 17:21
  • @user2357112: My understand is that if I implement all the [Abstract Methods shown](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes) for `MutableMapping`, then my class _will_ effectivly have all the (public) parts. I want to affect code to which I can't provide a `default` function. – martineau Sep 18 '19 at 17:24
  • 2
    All the public parts, perhaps, but C-level code will expect the private parts to be there too. If you could do what you're trying to do, it would break the `json` [C accelerator](https://github.com/python/cpython/blob/3.7/Modules/_json.c) and lots of other code too, probably leading to a segfault. – user2357112 Sep 18 '19 at 17:30
  • 2
    @user2357112: In that case the answer to my question may be "it's impossible". – martineau Sep 18 '19 at 17:35

3 Answers3

5

So, first things first, the "obvious way to do it", is to have a Json Encoder with a default method that would create a dict out of a CustomDict class while serializing:

Given

from collections.abc import MutableMapping
import json


class IdentaDict(MutableMapping):
    __getitem__ = lambda s, i: i
    __setitem__ = lambda s, i, v: None
    __delitem__ = lambda s, i: None
    __len__ = lambda s: 1
    __iter__ = lambda s: iter(['test_value'])

def default(obj):
    if isinstance(obj, MutableMapping):
            return dict(obj)
    raise TypeError()

print(json.dumps(IdentaDict, default=default)

will just work.

second

If for some reason, this is not desirable (maybe creating a dict out of the CustomDict is not feasible, or would be too expensive), it is possible to monkeypatch the machinery of Python's json.encoder, so that it uses the appropriate call to isinstance:


from collections.abc import MutableMapping
from functools import partial
from unittest.mock import patch

import json

class IdentaDict(MutableMapping):
   ...

a = IdentaDict()

new_iterencoder = partial(
    json.encoder._make_iterencode,
    isinstance=lambda obj, cls: isinstance(obj, MutableMapping if cls == dict else cls)
)

with patch("json.encoder.c_make_encoder", None), patch("json.encoder._make_iterencode", new_iterencoder):
    print(json.dumps(a))

(Note that while at it, I also disabled the native C encoder, so that the "pass indent to force Python encoder" hack is not needed. One never knows when an eager Python volunteer will implement indent in the C Json serializer and break that)

Also, the "mock.patch" thing is only needed if one plays mr. RightGuy and is worried about restoring the default behavior. Otherwise, just overriding both members of json.encoder in the application setup will make the changes proccess wide, and working for all json.dump[s] call, no changes needed to the calls - which might be more convenient.

third

Now, answering the actual question: what is possible is to have a mechanism that will create an actual subclass of "dict", but implementing all the methods needed by dict. Instead of re-doing the work done by collections.abc.MutableClass, it should be ok to just copy over both user methods and generated methods to the dict class:

import json
from abc import ABCMeta
from collections.abc import MutableMapping

class RealBase(ABCMeta):
    def __new__(mcls, name, bases, namespace, *, realbase=dict, **kwargs):
        abc_cls = super().__new__(mcls, name, bases, namespace, **kwargs)
        for attr_name in dir(abc_cls):
            attr = getattr(abc_cls, attr_name)
            if getattr(attr, "__module__", None) == "collections.abc" and attr_name not in namespace:
                namespace[attr_name] = attr
        return type.__new__(mcls, name, (realbase,), namespace)


class IdentaDict(MutableMapping, metaclass=RealBase):
    __getitem__ = lambda s, i: i
    __setitem__ = lambda s, i, v: None
    __delitem__ = lambda s, i: None
    __len__ = lambda s: 1
    __iter__ = lambda s: iter(['test_value'])

This will make the class work as expected, and return True to isinstance(IdentaClass(), dict). However the C Json Encoder will then try to use native dict API's to get its values: so json.dump(...) will not raise, but will fail unless the Python Json encoder is forced. Maybe this is why the instance check in json.encoder is for a strict "dict":

a = IdentaDict()


In [76]: a = IdentaDict()                                                                                                          

In [77]: a                                                                                                                         
Out[77]: {'test_value': 'test_value'}

In [78]: isinstance(a, dict)                                                                                                       
Out[78]: True

In [79]: len(a)                                                                                                                    
Out[79]: 1

In [80]: json.dumps(a)                                                                                                             
Out[80]: '{}'

In [81]: print(json.dumps(a, indent=4))                                                                                            
{
    "test_value": "test_value"
}

(Another side-effect of this metaclass is that as the value returned by __new__ is not an instance of ABCMeta, the metaclass __init__ won't be called. But people coding with multiple metaclass composition would have to be aware of such issues. This would be easily work-aroundable by explicitly calling mcls.__init__ at the end of __new__)

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • Very comprehensive answer, thank you. The third approach/hack will suffice in my usage scenario even with its limitation(s), I guess. The root cause of the whole issue is that strict instance type checking in the `json.encoder` which seems _very_ unpythonic and annoying to me — especially in a standard library module. Is this optimized C Json Encoder implementation publicly documented somewhere? Seems like it should either work exactly like the pure Python version or at least not fail silently. – martineau Sep 23 '19 at 22:33
  • 1
    Yes, I'd say the C JSON encoder behavior is buggy. Specially that the class in the example _is_ a dicitionary instance, and its contents are "invisible" to the encoder. Also, that using "indent=X" argument triggers the Python encoder instead was just found looking directly in the stdlib code, after puzzling why the example in the answer you wrote would not work without it. – jsbueno Sep 24 '19 at 16:56
  • 1
    Upon further reflection, I've changed my mind and have gone with something like the monkey-patching in your **second** approach (sans the `mock.patch`ing). I changed the way the `_make_iterencode()` replacement is created so it is supplied with `dict=Mapping` instead of `instance=lambda...` because it's simpler and works just as well. – martineau Sep 24 '19 at 18:32
1

I think I found a way to do it, based on a modified version of the suggestion in this answer to the question How to “perfectly” override a dict?.

Disclaimer: As the answer's author states, its a "monstrosity", so I probably would never actually use it in production code.

Here's the result:

from __future__ import print_function
try:
    from collections.abc import Mapping, MutableMapping  # Python 3
except ImportError:
    from collections import Mapping, MutableMapping  # Python 2


class SpreadSheet(MutableMapping):
    def __init__(self, tools=None, **kwargs):
        self.__class__ = dict  # see https://stackoverflow.com/a/47361653/355230

        self._cells = {}
        self._tools = {'__builtins__': None}
        if tools is not None:
            self._tools.update(tools)  # Add caller supplied functions.

    @classmethod
    def __class__(cls):  # see https://stackoverflow.com/a/47361653/355230
        return dict

    def clear(self):
        return self._cells.clear()

    def __contains__(self, key):
        return key in self._cells

    def __setitem__(self, key, formula):
        self._cells[key] = formula

    def __getitem__(self, key):
        return eval(self._cells[key], self._tools, self)

    def __len__(self):
        return len(self._cells)

    def __iter__(self):
        return iter(self._cells)

    def __delitem__(self, key):
        del self._cells[key]

    def getformula(self, key):
        """ Return raw un-evaluated contents of cell. """
        return self._cells[key]

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).iteritems():
            self[k] = v

#    # Doesn't work.
#    type(dict).register(SpreadSheet)  # Register class as dict subclass.


if __name__ == '__main__':

    import json
    from math import cos, sin, pi, tan

    # A small set of safe built-ins.
    tools = dict(len=len, sin=sin, cos=cos, pi=pi, tan=tan)

    ss = SpreadSheet(tools)
    ss['a1'] = '5'
    ss['a2'] = 'a1*6'
    ss['a3'] = 'a2*7'
    ss['b1'] = 'sin(pi/4)'

    print()
    print('isinstance(SpreadSheet(tools), dict) -> {}'.format(isinstance(ss, dict)))
    print()
    print('Static Contents via getformula():')
    print(json.dumps({k: ss.getformula(k) for k in ss.keys()}, indent=4))
    print()
    print('Dynamic Contents via __getitem__():')
    print("  ss['a1'] -> {!r}".format(ss['a1']))
    print("  ss['a2'] -> {!r}".format(ss['a2']))
    print("  ss['a3'] -> {!r}".format(ss['a3']))
    print("  ss['b1'] -> {!r}".format(ss['b1']))
    print()
    print("via json.dumps(ss, indent=4):")
    print(json.dumps(ss, indent=4))

Output:

isinstance(SpreadSheet(tools), dict) -> True

Static Contents via getformula():
{
    "a1": "5",
    "a2": "a1*6",
    "a3": "a2*7",
    "b1": "sin(pi/4)"
}

Dynamic Contents via __getitem__():
  ss['a1'] -> 5
  ss['a2'] -> 30
  ss['a3'] -> 210
  ss['b1'] -> 0.7071067811865475

via json.dumps(ss, indent=4):
{
    "a1": 5,
    "a2": 30,
    "a3": 210,
    "b1": 0.7071067811865475
}

Note: I got the idea for this class from an old ActiveState recipe by Raymond Hettinger.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • This is wicked!! – jsbueno Sep 19 '19 at 13:05
  • 1
    However, it just work if "indent" is passed - otherwise the C implementation of the jsn encoder is called, and it is not fooled by this trick. Also, this fools `ss.__class__` and `isinstance(ss, dict)` but not `type(ss)`. Assigint `__class__` in the class body to a simple descriptor such as `type("", (), {"__get__": lambda s, i, o: dict})` will work for all three (and no need to set `__class__` on the instance). – jsbueno Sep 19 '19 at 13:47
  • If you want to lie about `__class__`, a better way to do that would be with a `@property` that returns `dict`. It won't fool C-level `PyDict_Check` checks, though, which will cause weird bugs and inconsistent behavior depending on whether you hit C or Python code paths. – user2357112 Sep 19 '19 at 17:36
  • @jsbueno: No amount of `__class__` shadowing will fool `type` calls. `type` always gives the actual type, completely ignoring `__class__`. – user2357112 Sep 19 '19 at 17:37
  • Yes. Sorry for that - the descriptor approach obviously don't do that. Anyway, experimenting with that, it just came up that shadowing `__class__` is indeed as nasty as it can be - it can't be "undone" in pure Python code, it breaks the `__class__` info on subclasses, and so on. – jsbueno Sep 19 '19 at 20:47
0

You can do something like:

import json

def json_default(obj):
    if isinstance(obj, SpreadSheet):
        return obj._cells
    raise TypeError

cheet = SpreadSheet()    
cheet['a'] = 5
cheet['b'] = 23
cheet['c'] = -4


print(json.dumps(cheet, default=json_default))

Output:

{"a": 5, "b": 23, "c": -4}

The key is the function json_default that tells the json decoder how to serialize your class!

Omar
  • 139
  • 4
  • Subclassing doesn't really work because there's no suitable superclass (other than the abstract base class `MutableMapping`) — so I see no advantage to using `UserDict`. I've edited my question and added an explanation of the motives behind what I want to do. – martineau Sep 18 '19 at 14:42
  • Omar: Sorry, but it doesn't. I am aware that one could provide a `default` function to `json.dumps()` to make it treat the class like a `dict` — as well as do something similar via with a custom `JSONEncoder` subclass. My goal however is for `isinstance(SpreadSheet(), dict)}` to return `True` automatically so neither is necessary (since I can't change the [code that does](https://github.com/python/cpython/blob/3.7/Lib/json/encoder.py#L321) that in the `json` module). – martineau Sep 18 '19 at 17:09