How to gradually start adding type hints to a python project

opened beige book

Wouldn’t it be cool to start adding types to an existing code base? Yes, but usually that requires a change the underlining data types we used. For example we could have a code that heavily relies on non-typed dictionary manipulations:

def v2_modulus(v):
    return sqrt(v["x"] ** 2 + v["y"] ** 2)

This is a simple case, but we could also have some .get, del, .clear or any other dict calls, so my first approach wouldn’t be to create a dataclass or any class. We can use TypedDict:

from typing import TypedDict

class Vec2(TypedDict):
    x: float
    y: float

def v2_modulus(v: Vec2):
    return sqrt(v["j"] ** 2 + v["n"] ** 2)

Ohoo, sorry, I made a mistake while retyping this, I wrote v["j"] and v["n"] which clearly are not valid keys for our new Vec2. Luckily I was able to notice that with mypy:

error: TypedDict "Vec2" has no key 'j'
error: TypedDict "Vec2" has no key 'n'

What if instead of using dictionaries we were one of the functional-minded cool kinds which liked immutability with namedtuples? Well, we are in a better situation because mypy can check if we accesses a defined attribute or not. But it won’t catch typing errors:

from collections import namedtuple

Vec2 = namedtuple("Vec2", "x y")
v = Vec2(1, 2)
v.x + v.z
v.x + {}

It can’t catch v.x + {} because it namedtuple lacks typing information. It only defines fields:

error: "Vec2" has no attribute "z"

But you may have guesses it, for every untyped data there is a typed counter-part. Introducing typing.NamedTuple:

from typing import NamedTuple

class Vec2(NamedTuple):
    x: float
    y: float

v = Vec2(1, 2)
v.x + v.z
v.x + {}

This will catch all the problems with the previous code:

error: "Vec2" has no attribute "z"
error: Unsupported operand types for + ("float" and "Dict[, ]")

What if we only used raw tuples? Well, it’s easier:

from typing import Tuple

Vec2 = Tuple[float, float]

v: Vec2 = (1, 2)
v[0] + v[1]
v.x + v.z

The change here is that instead of creating an object of a given type, we need to add a type hint to the newly created variable to indicate it’s type.

error: "Tuple[float, float]" has no attribute "x"
error: "Tuple[float, float]" has no attribute "z"

Now imagine that, we want to fetch an Account based on its id:

class Account: ...
def get_acc(acc_id: int) -> Account: ...
get_acc(1+2)

Having account_id defined as an int or string or whatever “real” type it could be, is a really bad idea. And int is an int and shouldn’t be used as an account identifier, at least from a typing perspective. For this kind of situation we can use NewType.

AccountId = NewType("AccountId", int)
class Account: ...
def get_acc(acc_id: AccountId) -> Account: ...
get_acc(AccountId(1)+2)

That would produce an error:

error: Argument 1 to "get_acc" has incompatible type "int"; expected "AccountId"

Structural Duck-Typing

Believe it or not, duck-typing is nothing more than the implicit definition of an interface, that when is not honored an exception is thrown. What would happen if we have a lot of duck-typed code, like our Vector2 example:

from dataclasses import dataclass
from typing import NamedTuple
from math import sqrt


@dataclass
class Vector2:
    x: float
    y: float


class Vector3(NamedTuple):
    x: float
    y: float
    z: float


class Vector4:
    def __init__(self, x, y, z, w):
        self.x = x
        self.y = y
        self.z = z
        self.w = 2


def modulus(v2):
    return sqrt(v2.x + v2.y)


modulus(Vector2(1,1))
modulus(Vector3(1,1,1))
modulus(Vector4(1,1,1,1))

We really don’t want to start touching those Vector types at all, but what we could do is to add some extra typing to modulus to indicate that in reality it only semantically works for the Vector2 type.

from typing import Protocol

class V2(Protocol):
    @property
    def x(self) -> float: ...
    @property
    def y(self) -> float: ...


def modulus(v2: V2):
    return sqrt(v2.x + v2.y)

V2 is a protocol with two read-only (@property) attributes: x and y. The read-only thing is important in this example because of the use NamedTuple in Vector4.

At the time of writing structural typing is not support for modules by mypy: mypy#5018. Worst case scenario you can wrap Module in classes, and call it the day.