Writing a mypy Plugin 2019-03-22

By Max Woerner Chase

So, as I've mentioned, I'm working on a mypy plugin for Structured Data. First, I'll quickly go over why this is needed to start with. A typical ADT definition in Structured Data looks like this:

import typing

from structured_data import adt

T = typing.TypeVar("T")

class BinaryTree(typing.Generic[T]):
    """A data type that you probably don't want to actually use!"""

    Leaf: adt.Ctor
    Node: adt.Ctor["BinaryTree[T]", T, "BinaryTree[T]"]

To be honest, this is actually a little more advanced than the code I've tried to get it to accept. Some of this might be outright wrong. All the same, it has the issue of "indexing Ctor", which makes mypy raise errors. Variadic generic types exist, but they can't be user-defined currently.

This is exactly one of the use cases that the mypy documentation describes for one of the plugin hook methods. Right at the top of the Current list of plugin hooks. So, I just need to define a custom hook to handle this stuff. Simple, right?

Anyway, this is actually so far from simple so far, that I don't think I've actually changed the behavior yet. This post is about what it took for me to get to, maybe, black triangle.

First, a bit of background: the implementation of Structured Data is extremely modularized. Not in the sense that there are swappable components of some kind, but in the sense that there are a lot of Python modules, even though the public interface resides entirely in three modules: adt (algebraic (actually sum) data types), data (example data types), and match (destructuring matches). These modules rely on each other, and thirteen internal modules, some of which define just a single short class or function. I forget what my exact thought process was, but it doesn't bother me enough to want to change it, so I'm leaving it be. The relevant fact is that, while Ctor is exposed through the adt public module, it's defined in an internal module, and that internal module contains some significant statements, the nature of which I will temporarily conceal for the purpose of generating suspense.

Now, where this comes around to mypy plugins is, the hooks, at least the ones I've looked at, work by passing the hook a "fullname", which is a string, and getting back either None, or a callback that takes a specialized object. If there's a thorough explanation of the concepts involved, I've missed it, so I was experimenting, dropping in print statements and forcing failures to make pytest reveal the output. (Oh yeah, I'm using pytest-mypy-plugins to handle the testing. It took a little poking at the source code for me to figure out how to configure everything in a way that seems to work.) In any case, this revealed that mypy was asking the hooks for callbacks using a fullname that included the module in which Ctor is actually defined. I didn't want to make my plugin rely on the internal layout of Structured Data, so I looked for a way to fool mypy. It seemed that my initial efforts were in vain: the typing module defines a TYPE_CHECKING constant that is false at runtime, but statically considered true. In other words, by using this constant in a conditional, we can replace functionality with variants that have more desirable type-checking characteristics, such as being defined in a particular module.

Now, another point of confusion I had was, the UnboundType objects that my code was getting to work with, have an args attribute, but when I went to retrieve it, it was always empty, even though my test cases so far all use subscripting. I had no idea what to expect mypy to do, no clear and strong mental model, so assumed I was getting the annotations in some kind of multiple-passes-of-a-single-location workflow, which seemed inconvenient, but whatever.

Things started to get confusing for me when I realized that I was still getting the "real" fullname, even though I was trying to use TYPE_CHECKING to create a facade for mypy's benefit. I decided to try to future-proof this stuff by writing code that works on the aspects of Ctor that I consider unchanging:

Anyway, still confused by the behavior of the plugin architecture, I added more and more diagnostic logging, until I realized something: I was seeing calls that had non-empty args eventually, and they differed from the initial calls in such a way that they were almost certainly unrelated. The later lines were the ones I wanted to analyze, and the earlier ones? They weren't originating in the test data at all, not directly. The test data was importing from structured_data because it had to, and that was causing mypy to run on structured_data, and process those significant, suspense generating statements I mentioned earlier.

Those statements are... structured_data has a few internal caches that relate to Ctor objects, and those caches are defined alongside Ctor, in the same module, and have to be annotated with it. As such, because I had the facade in the public module, the private module didn't know about it, and its annotations, the annotations that have to be processed before the test data, included the fullname that corresponded to the runtime type.

So, if you're still following (this is kind of a mess...), the solution to my problems was to trust the code a little more: have the conditional import/dummy definition, and match the dummy fullname exactly, because mypy has no problem with the internal usage of Ctor, and therefore I don't want to analyze it myself. I missed that this was working in the first place, first because the internal usages were kicking up the internal name, and then because I thought I had to devise a fuzzy matching scheme, which concealed the fact that it was, in fact, matching two different strings.

Coding is hard, and as I said, I'm pretty sure this stuff doesn't really do anything yet. Good luck in your own coding endeavors, and I hope this breakdown of my confusion has given you some ideas for how to approach whatever confusion you may be dealing with.