Working with Python Dataclasses and Dataclass Wizard
If you're a Python coder, you're probably familiar with Zen. Three of its 19 guideline principles state that "explicit is better than implicit," "readability counts," and "simple is better than complex." When you're creating or integrating an existing Python package, you aim to find the most Pythonic way to do your task, both functionally and efficiently. Python's dataclasses library provides an attractive approach to quickly and easily creating objects. This package includes a suite of tools that help speed up and make your code legible, whether you're working on a Data Science or software development project. However, given that there is no magic wand without a wizard, the dataclass wizard package provides dataclasses with additional powers that can enhance your code in a Pythonic style. In this post, we will dive into these two packages to take our work to the next level.
Dataclasses Package
To use dataclasses, we import and apply the @dataclass
decorator. This decorator enables us to specify if the instance should be frozen (frozen=True
), initialized (init=True
), or slotted (slots=True
). Moreover, although the field
object is not required for creating dataclasses objects, we can use it to provide powers to the attributes, such as indicating default values, default initializers for non-primitive data types like dictionaries, and whether the attribute is part of the constructor (__init__
), and/or part of the class representation (__repr__
).
For our exploration, we'll use the dataclasses package to generate slotted classes. If you are unfamiliar with Python's slot mechanism, don't worry; you can still follow the post. Please feel free to explore the concept of slots in the following post ⬇️:
Harnessing the Power of Slots for Optimizing Memory Usage in Python
For example, let's create the slotted class ClassA
. This class's public attributes attr1
, attr2
, attr3
, and attr4
will be used by the constructor. attr1
and attr2
will be integers, while attr3
and attr4
will be lists and dictionaries, respectively. Let's allow the user to initialize the ClassA
instances without passing any attribute values. In addition to the attributes listed above, ClassA
will include a private attribute, _attr5
, that will not be part of the class constructor or representation, so the user will not see or interact with it.
from dataclasses import dataclass, field
from typing import Dict, Optional, List
@dataclass(slots=True)
class ClassA:
attr1: Optional[int] = field(default=None)
attr2: Optional[int] = field(default=None)
attr3: Optional[List[int]] = field(default_factory=list)
attr4: Optional[Dict[str, int]] = field(default_factory=dict)
_attr5: Optional[str] = field(default=None, repr=False, init=False)
def __post_init__(self):
if self.attr3:
self._attr5 = self.attr3[0]
class_a_instance = ClassA()
print(class_a_instance)
Output:
ClassA(attr1=None, attr2=None, attr3=[], attr4={})
By providing default values for integers and factories for dictionaries and lists, the field object aids in our class design, as shown in the above code snippet. It is also clear that the instance representation doesn't include the private attribute. Moreover, to prevent creating complex constructor methods, the dataclasses package provides a __post_init__
method that executes extra logic after the instance initialization.
class_a_instance.attr6 = 5
Output:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
----> 1 class_a_instance.attr6 = 5
AttributeError: 'ClassA' object has no attribute 'attr6' |
The slot mechanism prevents the creation of attributes that are not defined in the class. If the user attempts to assign a new attribute to an instance, the code snippet above shows how dataclasses implements this feature, raising an AttributeError
. If the user attempts to access the __dict__
attribute, it will trigger another AttributeError
. The rationale behind this is that, unlike regular Python classes, slotted classes don't construct a lookup dictionary to access attributes.
class_a_instance.__dict__
Output:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
----> 1 class_a_instance.__dict__
AttributeError: 'ClassA' object has no attribute '__dict__' |
For example, what if we wanted to convert the instances to a dictionary so we could store them later?
Well, by making use of the __slots__
attribute, we can locate the declared attributes and retrieve their values using the getattr
method. To further strengthen the code in the event of an inheritance pattern, we can iterate over the class instance Method Resolution Order (MRO) to determine the predecessors and their slots, as demonstrated in the code below.
{s: getattr(class_a_instance, s)
for s in {s for cls in type(class_a_instance).__mro__
for s in getattr(cls, "__slots__", ())}}
Output:
{'attr3': [], 'attr2': None, 'attr4': {}, 'attr1': None, '_attr5': None}
At this point, you should be asking yourself: Is there a more Pythonic approach?
Alternatively, the dataclasses package provides the asdict
function, which generates dictionary objects from dataclasses instances.
from dataclasses import asdict
class_a_dict = asdict(class_a_instance)
print(class_a_dict)
Output:
{'attr1': None, 'attr2': None, 'attr3': [], 'attr4': {}, '_attr5': None}
asdict
recursively transform the object into a dictionary. Nonetheless, you can see that _attr5
, the private attribute, is also part of the dictionary representation. Fortunately, asdict
accepts the dict_factory
argument that enables customizing the resulting dictionary representation by passing a function.
def dict_factory(data):
return {attr_key: attr_val for attr_key, attr_val in data if not attr_key.startswith("_")}
class_a_dict = asdict(class_a_instance, dict_factory=dict_factory)
print(class_a_dict)
Output:
{'attr1': None, 'attr2': None, 'attr3': [], 'attr4': {}}
Let's explore a "dummy" example to show how asdict
can recursively incorporate attributes belonging to parent classes in the case of an inheritance pattern without the need to create a customized dict_factory
function that loops over the MRO.
from dataclasses import dataclass, field, asdict
from typing import Dict, Optional, List
@dataclass(slots=True)
class Parent:
attr1: Optional[int] = field(default=None)
attr2: Optional[int] = field(default=None)
@dataclass(slots=True)
class Child(Parent):
attr3: Optional[int] = field(default=None)
attr4: Optional[int] = field(default=None)
child_instance = Child(attr1=1, attr2=2, attr3=3, attr4=4)
child_dict = asdict(child_instance)
print(child_dict)
Output:
{'attr1': 1, 'attr2': 2, 'attr3': 3, 'attr4': 4}
Even in the case of a composite pattern, asdict
looks for attributes recursively not only in the attributes of the composite class but also in its constituent classes, generating a dictionary representation without requiring a customized dict_factory
function, as illustrated in the code snippet below.
from dataclasses import dataclass, field, asdict
from typing import Dict, Optional, List
@dataclass(slots=True)
class ClassA:
attr1: Optional[int] = field(default=None)
attr2: Optional[int] = field(default=None)
@dataclass(slots=True)
class ClassB:
attr3: Optional[int] = field(default=None)
attr4: Optional[int] = field(default=None)
attr5: Optional[List[ClassA]] = field(default_factory=list)
instance_b = ClassB(attr3=3, attr4=4,
attr5=[ClassA(attr1=1, attr2=2)])
dict_b = asdict(instance_b)
print(dict_b)
Output:
{'attr3': 3, 'attr4': 4, 'attr5': [{'attr1': 1, 'attr2': 2}]}
In some scenarios, we need to use immutable data types, and we make use of Enum
objects for defining immutable, related sets of constant and finite values. The code snippet below shows how to use dataclasses objects and Enum
children together. Recursively, asdict
generates the dictionary object without raising any errors. However, the AttrType
object wasn't properly serialized.
from dataclasses import dataclass, field, asdict
from typing import Dict, Optional, List
from enum import Enum
class AttrType(Enum):
TYPE1 = "type1"
TYPE2 = "type2"
@dataclass(slots=True)
class Parent:
attr1: Optional[int] = field(default=None)
attr2: Optional[int] = field(default=None)
@dataclass(slots=True)
class Child(Parent):
attr3: Optional[int] = field(default=None)
attr4: Optional[int] = field(default=None)
attr5: Optional[AttrType] = field(default=None)
child_instance = Child(attr1=1, attr2=2,
attr3=3, attr4=4,
attr5=AttrType(value="type1"))
child_dict = asdict(child_instance)
print(child_dict)
Output:
{'attr1': 1,
'attr2': 2,
'attr3': 3,
'attr4': 4,
'attr5': }
Again, we must use the dict_factory
function to resolve this problem by identifying whether the attribute value corresponds to an Enum
instance.
def dict_factory(data):
return {attr_key: (attr_val.value if isinstance(attr_val, Enum) else attr_val) for attr_key, attr_val in data}
child_instance = Child(attr1=1, attr2=2,
attr3=3, attr4=4,
attr5=AttrType(value="type1"))
child_dict = asdict(child_instance, dict_factory=dict_factory)
print(child_dict)
Output:
{'attr1': 1, 'attr2': 2, 'attr3': 3, 'attr4': 4, 'attr5': 'type1'}
Surely by now you're asking how the dataclass wizard can bestow superpowers and extra vitamins upon our dataclasses objects