Coding 2025-05-23

By Max Woerner Chase

Okay, here's a little more detail on the state of things.

My goal was to be able to represent, say, both the simply-typed lambda calculus, and System F, with as much code sharing as I can. In particular, there is a lot of common structure between them, and a lot of common evaluation logic. But the differences between them are problematic if they're represented in "normal-looking" code. Because System F has more kinds of node than the simply-typed lambda calculus, not all valid System F expressions are valid expressions in the simply-typed lambda calculus. If we figure that the expressions of simply-typed lambda calculus would make up a subset of the expressions of System F, then that means that the representation of System F must be a superclass of the representation of the simply-typed lambda calculus? But removing node types by subclassing is counter-intuitive, and if the evaluation logic is represented by methods (normal and obvious object-oriented practice), then subclassing in either direction introduces Liskov/variance issues.

So, we're not going to be able to reuse that common evaluation logic or structure through subclassing. What then?

I'll be honest, this next part's a bit of a haze, and I went down some weird dead ends, so let's skip ahead a little.

Due to various paper-cuts, I ended up not trusting the individual nodes with any logic outside of, like, alternative constructors, attrs validators, and stringification. Everything else, I converted from properties and methods to a bunch of singledispatch functions that all call each other recursively and interact with some global constants and functions. So, this delivers the same functionality that I had before, and has made all of the node types lean, at least. (Although, note that I've lost some type-checking robustness because it's not checking that all necessary types are registered. It would be good to find some way to guarantee that...) But how to allow for trees made up of different types?

Well, here's my plan, which I haven't tried yet, but I'm pretty sure will work:

I can define recursive type aliases, so what if I replace every union of node types in a node type definition, with a type variable? Then, to restore the original tree structure, I create type aliases to the filled-in versions of the nodes, then union those aliases together to get the parameter to fill into the generic version. To add more nodes, I simply add more types to the union.

But now let's consider the evaluation code. Sure, some evaluation code is now reusable, like determining the type information associated with a constant node, but most of these singledispatch functions refer back to each other in a recursive manner, that makes them unsuitable for reuse. If only we could somehow defer the reference, via some form of late binding...

It's a class. I'm taking all of these singledispatch functions and global constants, and sticking them in a class and turning them into singledispatchmethod classmethods.

Now, it'd be really neat if subclassing were practical now, but it is, for a few reasons at different theoretical levels, not. Instead, my current plan is to rewrite these functions to be generic, and then have the singledispatch machinery fill in the concrete types. (I mean, assuming Mypy cares. At runtime, this stuff is just all going to vibe.) Then, I can just call the register methods inline instead of as decorators. (I feel like this whole thing should work, but I'm also acutely aware that Mypy's singledispatchmethod support has (or, hopefully, had) some weird and confusing sharp edges and holes.)

Anyway, the basic idea for reuse is to define Protocols that lay out the methods and attributes that must exist on the concrete classes, parameterized of course. Then, at long last, we actually have a situation in this project where subclassing is appropriate.

Hopefully, I'll have the energy tomorrow to properly try this, but until then, I've at least got this stuff outlined.

Good night.