Three Dollar Quill

Coding 2025-04-28

Tags:

Mon 28 April 2025

By Max Woerner Chase

Okay, here we go.

First off, my prototype for the grammar source does appear to be syntactically valid TOML. Good for me. Secondly, the layout of what's there does look about right. As to whether this all works, that remains to be seen, because I'm still working on some choices about the right way to model this stuff with custom classes.

The big question that I have is whether it suffices to make the runtime representation closer to the output format (in particular, whether I "need" to have different classes for tables that have different representations in TOML), or whether I want the representation to be close enough to the file that it's possible to round-trip between tomlkit classes and my own classes.

And whether I care about round-tripping like that depends on whether the canonicalization logic I have in mind for the configuration file can be written purely in terms of the tomlkit structures, or whether that represents a duplication of effort. And I suppose that depends on whether I assume that I'm always unstructuring canonicalized data, or if it makes more sense to incorporate the canonicalization into the structuring process. At the same time, I only need to worry about un*structuring logic in the case that I *am canonicalizing via round-trip. Either way, talking through this has gotten me to figure out a slightly better representation for some of this, so I'm going to go implement that quickly.

...

Okay, back to "where do we canonicalize?". Thinking about this a little further, I don't want to have to be responsible for tomlkit classes inside hypothetical unstructuring code, so let's see if pre-cattrs canonicalization is feasible. Canonicalization is partly a question of handling a default value for one content type, and getting the sorting of various parts of the file correct:

The sections list should match the ordering of the table of contents
The lexicon should be alphabetized by stem
A variety of things should have field orders matching their runtime definitions, except that whenever the spec field appears, it should be first
The cells of sparse tables should be sorted with the rows as the most significant radix, and the columns as the least. (Technically, I guess this means that canonicalization has to involve some level of structuring, even if it's not returned anywhere, because sometimes the canonical order is specified elsewhere in the document.)

Because that third bullet point seems pretty obnoxious to handle in the context of generated unstructuring hooks, let's assume that we're going with pre-unstructuring canonicalization.

Actually, now that I look at this with slightly fresher eyes, I see that I need to reconsider one of my table definitions, and either change one that currently exists, or add a new one.

...

I think I came to a reasonable decision, which I can always revisit as I tweak this stuff further.

For now, I want to wind down for a bit.

(Actually, I was thinking a bit more, and now I'm not totally sure how I want to handle some of these tables, so that's going to be, interesting.)

Good night.