## Expert python topics you should know

What are the main topics that distinguish an advanced developer from a just effective enough python programmer? You are good at python when what you code is elegantly simple and idiomatic.

Each language and community has its own way of resolving certain kind of problem. That specific way of doing things, is what we call idiomatic. We want our code to be idiomatic because not only we will be writing code that is easier to understand but also we are resolving problems using well known and tested techniques. Being idiomatic is to create simple code that relies on existing solution for normal problems. We don’t reinvent the wheel.

In this post I will describe the main topics that can make your code more idiomatic, and some advanced functionalities you need to be familiar as an advanced python developer.

## Multiple python versions

There will be situations where you need multiple versions of python. You may be just fine using the default python 2 or 3 of your system. But there are situation when some client/project requires a very specific version. You may also need to work in different projects which any of them may use different specific versions. In this scenario you need a way to manage your python versions. And this is not the same than managing dependency versions. I’m talking about the python language version itself.

The solution to this problem is very simple, just use pyenv. With it, you will be able to have any version you want at your disposal, very easy.

$pyenv versions # lists all installed versions$ pyenv install 3.7.4 # installs specific verion
$pyenv global 3.7.4 # activates the specific version$ pyenv local 3.7.4 # version for a directory


The pyenv also installs development headers that you will need when making c/c++ extensions. But you shouldn’t worry about the exact path. CMake‘s find_package is going to help you with that.

find_package(Python3 COMPONENTS Development)
target_include_directories(<project_name>
PUBLIC ${Python3_INCLUDE_DIRS})  ## Dunder methods Dunder or magic method, are methods that start and end with double _ like __init__ or __str__. This kind of methods are the mechanism we use to interact directly with python’s data model. A language data model describes: • How Values are stored in memory. • Object identity, equality and truthiness. • Name resolution, function/method dispatching. • Basic types, type/value composition. • Evaluation order, eagerness/laziness. Basically the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism of implementing behaviours, interface methods. For a detailed description of many useful dunders and related concepts I recommend you to read this guide. ## @ Function decorators Decorators are nothing more than an special case of Higher Order functions with @ syntax support. It’s quite equivalent to doing function composition. We can use decorator not only for normal functions but also for class methods. from functools import wraps def add10(f): @wraps(f) def g(*args, **kwargs): return f(*args, **kwargs) + 10 return g @add10 def add1(a): return a + 1 p.add1(0) # 11  wraps from fuctools is in itself another decorator that keeps the metadata from the original wrapped function. ## Interfaces Interfaces help us to enforce the implementation of certain characteristics by other code that commits in doing so. One of the characteristic we can enforce is the definition of specific methods, like we would do defining a normal java interface: from abc import ABC, abstractclassmethod class Animal(ABC): @abstractclassmethod def make_sound(self): return "indistinguishable noise" class Cat(Animal): def make_sound(self): return "miauu" class Dog(Animal): def make_something(self): return "eat"  We used the @abstractclassmethod decorator to enforce the definition of specific method in child classes: >>> Cat().make_sound() 'miauu' >>> Dog() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Can't instantiate abstract class Dog with abstract methods make_sound  But the previous enforcement required us to try to create an instance in the first place. If we would like to be even more strict, we could use metaclass to make the script fail while loading the class definition. class Animal(type): def __new__(cls, name, bases, body): if 'make_sound' not in body: raise TypeError('no make_sound method') return super().__new__(cls, name, bases, body) class Cat(metaclass=Animal): def make_sound(self): return "miauu" class Dog(metaclass=Animal): def make_something(self): return "eat" Traceback (most recent call last): [...] class Dog(metaclass=Animal): TypeError: no make_sound method  In the same way that when we instantiate a class we create an object, when we instantiate a Meta-class we create a class. Meta Classes are a way of controlling the creation of classes. This example also indirectly shows that the __new__ dunder is the responsible of creating the instance while __init__ initialized the instance previously created by __new__. Since python 3.6 instead of having the __new__ method inside a metaclass we can just use __init_subclass__ instead. Then our interface example would be the following: from _collections_abc import _check_methods class Animal(): def __init_subclass__(cls, *args, **kwargs): if _check_methods(cls, 'make_sound') is NotImplemented: raise TypeError("make_sound not implemented") class Cat(Animal): [...] # same than in previous example class Dog(Animal): [...] # same than in previous example  If you use _check_methods you will have extra style points. ## Context manager Context manager, or in mundane words: classes with __enter__ and __exit__ methods. Context managers give us support for the RAII pattern through the with syntax. An important thing to remember is that when implementing __exit__ you should check the exception values, because you have the choice to propagate or not the exception that happened inside with. If you return a true value you can suppress the exception. But under no circumstance you are expected to re-raise an exception inside the __exit__ method. For example, let suppose we had nothing better to do than to use the low level http.client library; we could wrap HTTPConnection inside a context manager: from http.client import HTTPConnection from contextlib import AbstractContextManager # not really necessary but looks cool class Conn(AbstractContextManager): def __init__(self, host): self.host = host def __enter__(self): self.conn = HTTPConnection(self.host, 80) return self.conn def __exit__(self, *args): self.conn.close() with Conn('example.com') as conn: conn.request('GET', '/') res = conn.getresponse() print(res.status, res.reason)  That code is very verbose, we can fix that using the contextlib module. I recommend you to read it’s whole documentation. If we want we can use a generator instead of a full AbstractContextManager class implementation. from http.client import HTTPConnection from contextlib import contextmanager @contextmanager def Conn(host): conn = HTTPConnection(host, 80) try: yield conn finally: conn.close()  We also can get rid completely of the Conn class with closing: from contextlib import closing with closing(HTTPConnection("example.com", 80)) as conn: conn.request('GET', '/') res = conn.getresponse() print(res.status, res.reason)  ## Asynchronous programming When we use async and await we are doing cooperative concurrency (not parallelism). You may want to check some online documentation or tutorial online if you are not familiar with those terms. In practice we have a bunch of async functions and an event loop. And that’s it. But what happens when you actually want to define a real parallel operation? What if some important client wants to have some custom crazy high performant network code? How can we create low-level parallel code that from the point of view of Python appears to be asynchronous code? First we need a way to make a blocking code a coroutine. The following make_async function does exactly that: import sys import inspect from functools import wraps from concurrent.futures import ThreadPoolExecutor def make_async(g): @wraps(g) async def f(*args, **kwargs): loop = asyncio.get_event_loop() return await loop.run_in_executor( ThreadPoolExecutor(), lambda: g(*args, **kwargs) ) sys.modules[__name__] frm = inspect.stack()[1] mod = inspect.getmodule(frm[0]) setattr(mod, g.__name__, f)  A very neat function right? Yehaa, but this will only truly work if the g function releases the GIL. The following code, which uses pybind11, defines a C++ module x with a send_message function that inside it releases the GIL. #include <pybind11/pybind11.h> #include <thread> #include <chrono> std::string send_message(std::string input) { pybind11::gil_scoped_release release; // GIL RELEASE std::this_thread::sleep_for(std::chrono::seconds(1)); return input + " done!"; } PYBIND11_MODULE(x, m) { m.def("send_message", &amp;send_message, "sends something though the network"); }  The pybind11::gil_scoped_release class releases the GIL when is constructed and then acquires the GIL again at the end of the function call. import asyncio from x import send_message make_async(send_message) # using the function from above async def send(msg): print("sending", msg) result = await send_message(msg) print("sent", result) return result loop = asyncio.get_event_loop() to_send = [loop.create_task(send(str(i))) for i in range(3)] loop.run_until_complete(asyncio.wait(to_send))  And the output is what you would expect: sending 0 sending 1 sending 2 sent 0 done! sent 1 done! sent 2 done!  ## Profiling Profiling are some of those techniques that we use when we really fucked something up. It’s a great tool to know, but you will suffer trying to figure out why you are getting some esoteric crashes, or why something isn’t working as it should. For calling trees and CPU time we can use cprofile and KCacheGrind: $ python -m cProfile -o script.profile main.py
$pyprof2calltree -i script.profile -o script.calltree$ kcachegrind script.calltree


But cprofile, profile and hotshot aren’t that useful if we have multi-threaded code or if any bottleneck is generated by non-explicit function calls. A much more effective profiler is yappi, and it really is. You won’t go back to cprofile after playing around with yappi. Don’t take my word for it, you can see that the PyCharm IDE uses yappi by default if you have it installed.

To use yappi we need to add some code to our script:

import yappi
yappi.start(builtins=True)
# a context manager would be great for this
func_stats = yappi.get_func_stats()
func_stats.save('script.calltree', 'CALLGRIND')
yappi.stop()
yappi.clear_stats()


After the profiling ends we can open the profiling file with:

\$ kcachegrind script.calltree


Another important aspect of profiling is to record memory usage.

We can take a memory snapshot in any moment with pympler:

from pympler import muppy, summary
all_objects = muppy.get_objects()
summary.print_(summary.summarize(all_objects), limit=100)


The main features of pympler can be accessed through ClassTracker.

## Network analysis

Some applications are very hard to understand and we need to start seeing them as black-boxes. Or maybe we have a very obscure problem when we send data through the network.

The most practical way of analysis would consist on you modifying your software to record every request it receives or sends, and in a perfect world you would want that feature to be able to turn on/off while in production.

But most mortals don’t understand thir own systems enough nor want to go thorough that time investment. But even in that case, there are things we can do:

• Incoming traffic: Ensure that our server receives HTTP traffic, we can do this being behind a load balancer or a reverse proxy, so that we can keep serving https. Doing this we can simply use wireshark to read the incoming traffic.
• Outgoing traffic: We need a proxy, and if we are making HTTPS requests we need to install custom certificates, so we can me “the man in the middle”. This requires the use of mitmproxy

In normal scenarios, where you understand and control the codebase, you should be logging and analysing the traffic internally without relying on the previous tricks, specially because you may won’t be able to do “mitm attacks” with a production server under heavy load without slowing everything down.

## Logging

Most of the time, logging just works and you shouldn’t worry. But under heavy load we can approach logging by:

• Don’t logging at all, and only using metrics instead. Or,
• Send the logs through the network: if log locally, that will put your server and your code under heavy load, and you may need to create code specially designed to being able to handle the logging.
• Avoid logging to a disk that you use for something else: Don’t put load to a disk that you use for other tasks.
• If you want to write your local logs yourself, please ensure that you rotate them. You could use RotatingFileHandler, but logrotate is better.

Obviously how you approach any logging problem will depend on how often, how important, and how big are the logs. In most cases you can just log and forget that that exists.

## How to gradually start adding type hints to a python project

Wouldn’t it be cool to start adding types to an existing code base? Yes, but usually that requires a change the underlining data types we used. For example we could have a code that heavily relies on non-typed dictionary manipulations:

def v2_modulus(v):
return sqrt(v["x"] ** 2 + v["y"] ** 2)


This is a simple case, but we could also have some .get, del, .clear or any other dict calls, so my first approach wouldn’t be to create a dataclass or any class. We can use TypedDict:

from typing import TypedDict

class Vec2(TypedDict):
x: float
y: float

def v2_modulus(v: Vec2):
return sqrt(v["j"] ** 2 + v["n"] ** 2)


Ohoo, sorry, I made a mistake while retyping this, I wrote v["j"] and v["n"] which clearly are not valid keys for our new Vec2. Luckily I was able to notice that with mypy:

error: TypedDict "Vec2" has no key 'j'
error: TypedDict "Vec2" has no key 'n'


What if instead of using dictionaries we were one of the functional-minded cool kinds which liked immutability with namedtuples? Well, we are in a better situation because mypy can check if we accesses a defined attribute or not. But it won’t catch typing errors:

from collections import namedtuple

Vec2 = namedtuple("Vec2", "x y")
v = Vec2(1, 2)
v.x + v.z
v.x + {}


It can’t catch v.x + {} because it namedtuple lacks typing information. It only defines fields:

error: "Vec2" has no attribute "z"


But you may have guesses it, for every untyped data there is a typed counter-part. Introducing typing.NamedTuple:

from typing import NamedTuple

class Vec2(NamedTuple):
x: float
y: float

v = Vec2(1, 2)
v.x + v.z
v.x + {}


This will catch all the problems with the previous code:

error: "Vec2" has no attribute "z"
error: Unsupported operand types for + ("float" and "Dict[, ]")


What if we only used raw tuples? Well, it’s easier:

from typing import Tuple

Vec2 = Tuple[float, float]

v: Vec2 = (1, 2)
v[0] + v[1]
v.x + v.z


The change here is that instead of creating an object of a given type, we need to add a type hint to the newly created variable to indicate it’s type.

error: "Tuple[float, float]" has no attribute "x"
error: "Tuple[float, float]" has no attribute "z"


Now imagine that, we want to fetch an Account based on its id:

class Account: ...
def get_acc(acc_id: int) -> Account: ...
get_acc(1+2)


Having account_id defined as an int or string or whatever “real” type it could be, is a really bad idea. And int is an int and shouldn’t be used as an account identifier, at least from a typing perspective. For this kind of situation we can use NewType.

AccountId = NewType("AccountId", int)
class Account: ...
def get_acc(acc_id: AccountId) -> Account: ...
get_acc(AccountId(1)+2)


That would produce an error:

error: Argument 1 to "get_acc" has incompatible type "int"; expected "AccountId"


## Structural Duck-Typing

Believe it or not, duck-typing is nothing more than the implicit definition of an interface, that when is not honored an exception is thrown. What would happen if we have a lot of duck-typed code, like our Vector2 example:

from dataclasses import dataclass
from typing import NamedTuple
from math import sqrt

@dataclass
class Vector2:
x: float
y: float

class Vector3(NamedTuple):
x: float
y: float
z: float

class Vector4:
def __init__(self, x, y, z, w):
self.x = x
self.y = y
self.z = z
self.w = 2

def modulus(v2):
return sqrt(v2.x + v2.y)

modulus(Vector2(1,1))
modulus(Vector3(1,1,1))
modulus(Vector4(1,1,1,1))


We really don’t want to start touching those Vector types at all, but what we could do is to add some extra typing to modulus to indicate that in reality it only semantically works for the Vector2 type.

from typing import Protocol

class V2(Protocol):
@property
def x(self) -> float: ...
@property
def y(self) -> float: ...

def modulus(v2: V2):
return sqrt(v2.x + v2.y)


V2 is a protocol with two read-only (@property) attributes: x and y. The read-only thing is important in this example because of the use NamedTuple in Vector4.

At the time of writing structural typing is not support for modules by mypy: mypy#5018. Worst case scenario you can wrap Module in classes, and call it the day.