Working with Python Dataclasses and Dataclass Wizard

Author:Murphy  |  View: 25211  |  Time: 2025-03-22 22:41:18

If you're a Python coder, you're probably familiar with Zen. Three of its 19 guideline principles state that "explicit is better than implicit," "readability counts," and "simple is better than complex." When you're creating or integrating an existing Python package, you aim to find the most Pythonic way to do your task, both functionally and efficiently. Python's dataclasses library provides an attractive approach to quickly and easily creating objects. This package includes a suite of tools that help speed up and make your code legible, whether you're working on a Data Science or software development project. However, given that there is no magic wand without a wizard, the dataclass wizard package provides dataclasses with additional powers that can enhance your code in a Pythonic style. In this post, we will dive into these two packages to take our work to the next level.


Dataclasses Package

To use dataclasses, we import and apply the @dataclass decorator. This decorator enables us to specify if the instance should be frozen (frozen=True), initialized (init=True), or slotted (slots=True). Moreover, although the field object is not required for creating dataclasses objects, we can use it to provide powers to the attributes, such as indicating default values, default initializers for non-primitive data types like dictionaries, and whether the attribute is part of the constructor (__init__), and/or part of the class representation (__repr__).

For our exploration, we'll use the dataclasses package to generate slotted classes. If you are unfamiliar with Python's slot mechanism, don't worry; you can still follow the post. Please feel free to explore the concept of slots in the following post ⬇️:

Harnessing the Power of Slots for Optimizing Memory Usage in Python

For example, let's create the slotted class ClassA. This class's public attributes attr1, attr2, attr3, and attr4 will be used by the constructor. attr1 and attr2 will be integers, while attr3 and attr4 will be lists and dictionaries, respectively. Let's allow the user to initialize the ClassA instances without passing any attribute values. In addition to the attributes listed above, ClassA will include a private attribute, _attr5, that will not be part of the class constructor or representation, so the user will not see or interact with it.

from dataclasses import dataclass, field
from typing import Dict, Optional, List

@dataclass(slots=True)
class ClassA:
  attr1: Optional[int] = field(default=None)
  attr2: Optional[int] = field(default=None)
  attr3: Optional[List[int]] = field(default_factory=list)
  attr4: Optional[Dict[str, int]] = field(default_factory=dict)
  _attr5: Optional[str] = field(default=None, repr=False, init=False)

  def __post_init__(self):
    if self.attr3:
      self._attr5 = self.attr3[0]

class_a_instance = ClassA()
print(class_a_instance)

Output:

ClassA(attr1=None, attr2=None, attr3=[], attr4={})

By providing default values for integers and factories for dictionaries and lists, the field object aids in our class design, as shown in the above code snippet. It is also clear that the instance representation doesn't include the private attribute. Moreover, to prevent creating complex constructor methods, the dataclasses package provides a __post_init__ method that executes extra logic after the instance initialization.

class_a_instance.attr6 = 5

Output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
 in ()
----> 1 class_a_instance.attr6 = 5

AttributeError: 'ClassA' object has no attribute 'attr6'

The slot mechanism prevents the creation of attributes that are not defined in the class. If the user attempts to assign a new attribute to an instance, the code snippet above shows how dataclasses implements this feature, raising an AttributeError. If the user attempts to access the __dict__ attribute, it will trigger another AttributeError. The rationale behind this is that, unlike regular Python classes, slotted classes don't construct a lookup dictionary to access attributes.

class_a_instance.__dict__

Output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
 in ()
----> 1 class_a_instance.__dict__

AttributeError: 'ClassA' object has no attribute '__dict__'

For example, what if we wanted to convert the instances to a dictionary so we could store them later?

Well, by making use of the __slots__ attribute, we can locate the declared attributes and retrieve their values using the getattr method. To further strengthen the code in the event of an inheritance pattern, we can iterate over the class instance Method Resolution Order (MRO) to determine the predecessors and their slots, as demonstrated in the code below.

{s: getattr(class_a_instance, s)
            for s in {s for cls in type(class_a_instance).__mro__
                      for s in getattr(cls, "__slots__", ())}}

Output:

{'attr3': [], 'attr2': None, 'attr4': {}, 'attr1': None, '_attr5': None}

At this point, you should be asking yourself: Is there a more Pythonic approach?

Alternatively, the dataclasses package provides the asdict function, which generates dictionary objects from dataclasses instances.

from dataclasses import asdict

class_a_dict = asdict(class_a_instance)
print(class_a_dict)

Output:

{'attr1': None, 'attr2': None, 'attr3': [], 'attr4': {}, '_attr5': None}

asdict recursively transform the object into a dictionary. Nonetheless, you can see that _attr5, the private attribute, is also part of the dictionary representation. Fortunately, asdict accepts the dict_factory argument that enables customizing the resulting dictionary representation by passing a function.

def dict_factory(data):
  return {attr_key: attr_val for attr_key, attr_val in data if not attr_key.startswith("_")}

class_a_dict = asdict(class_a_instance, dict_factory=dict_factory)
print(class_a_dict)

Output:

{'attr1': None, 'attr2': None, 'attr3': [], 'attr4': {}}

Let's explore a "dummy" example to show how asdict can recursively incorporate attributes belonging to parent classes in the case of an inheritance pattern without the need to create a customized dict_factory function that loops over the MRO.

from dataclasses import dataclass, field, asdict
from typing import Dict, Optional, List

@dataclass(slots=True)
class Parent:
  attr1: Optional[int] = field(default=None)
  attr2: Optional[int] = field(default=None)

@dataclass(slots=True)
class Child(Parent):
  attr3: Optional[int] = field(default=None)
  attr4: Optional[int] = field(default=None)

child_instance = Child(attr1=1, attr2=2, attr3=3, attr4=4)
child_dict = asdict(child_instance)
print(child_dict)

Output:

{'attr1': 1, 'attr2': 2, 'attr3': 3, 'attr4': 4}

Even in the case of a composite pattern, asdict looks for attributes recursively not only in the attributes of the composite class but also in its constituent classes, generating a dictionary representation without requiring a customized dict_factory function, as illustrated in the code snippet below.

from dataclasses import dataclass, field, asdict
from typing import Dict, Optional, List

@dataclass(slots=True)
class ClassA:
  attr1: Optional[int] = field(default=None)
  attr2: Optional[int] = field(default=None)

@dataclass(slots=True)
class ClassB:
  attr3: Optional[int] = field(default=None)
  attr4: Optional[int] = field(default=None)
  attr5: Optional[List[ClassA]] = field(default_factory=list)

instance_b = ClassB(attr3=3, attr4=4,
                    attr5=[ClassA(attr1=1, attr2=2)])
dict_b = asdict(instance_b)
print(dict_b)

Output:

{'attr3': 3, 'attr4': 4, 'attr5': [{'attr1': 1, 'attr2': 2}]}

In some scenarios, we need to use immutable data types, and we make use of Enum objects for defining immutable, related sets of constant and finite values. The code snippet below shows how to use dataclasses objects and Enum children together. Recursively, asdict generates the dictionary object without raising any errors. However, the AttrType object wasn't properly serialized.

from dataclasses import dataclass, field, asdict
from typing import Dict, Optional, List
from enum import Enum

class AttrType(Enum):
  TYPE1 = "type1"
  TYPE2 = "type2"

@dataclass(slots=True)
class Parent:
  attr1: Optional[int] = field(default=None)
  attr2: Optional[int] = field(default=None)

@dataclass(slots=True)
class Child(Parent):
  attr3: Optional[int] = field(default=None)
  attr4: Optional[int] = field(default=None)
  attr5: Optional[AttrType] = field(default=None)

child_instance = Child(attr1=1, attr2=2,
                       attr3=3, attr4=4,
                       attr5=AttrType(value="type1"))
child_dict = asdict(child_instance)
print(child_dict)

Output:

{'attr1': 1,
 'attr2': 2,
 'attr3': 3,
 'attr4': 4,
 'attr5': }

Again, we must use the dict_factory function to resolve this problem by identifying whether the attribute value corresponds to an Enum instance.

def dict_factory(data):
  return {attr_key: (attr_val.value if isinstance(attr_val, Enum) else attr_val) for attr_key, attr_val in data}

child_instance = Child(attr1=1, attr2=2,
                       attr3=3, attr4=4,
                       attr5=AttrType(value="type1"))
child_dict = asdict(child_instance, dict_factory=dict_factory)
print(child_dict)

Output:

{'attr1': 1, 'attr2': 2, 'attr3': 3, 'attr4': 4, 'attr5': 'type1'}

Surely by now you're asking how the dataclass wizard can bestow superpowers and extra vitamins upon our dataclasses objects

Tags: Data Science Programming Python Python Programming Technology

Comment