Coding 2025-11-23

By Max Woerner Chase

All right, I touched up the romanization ideas a little, and now I need to go over how I plan to update definitions. There are two basic questions I need to answer here: how do I do this (what does the code look like?) and when do I do this (where does this fit into the build specification?)

It feels bad to me to construct a lexicon that has outdated glosses, and then a separate one with updated glosses, so I guess I need to add an option to the sound evolution command to take a file of glosses to update. Then I need to add another bunch of functions to my build file generator, and ponder whether I'm doing things wrong in some way. Like, right now, I'm adding new rules for every command-line flag, which is not sustainable; I must be getting something wrong there. After this, I need to consider changing the lexicon format, but let's consider this for a bit. When I stored a filename in a variable, I needed to make sure it was quoted to stop it from expanding into multiple files. So, I can store multiple things in a variable, space-separated. This means I should be able to store the command-line arguments in one variable using shlex, which is a better idea all around anyway. So, that's a solid plan for updating the build file generation. If it's stupid, and it works, then it's not stupid. However, it is about to stop working. So, I'll have to work on that early tomorrow.

Now, let's consider the lexicon format. I had this idea that it would be cool to work with the lexicon in tabular format. It's frankly kind of painful, and I'm currently considering switching it to TOML.

:)

Yeah, at least you wouldn't be storing coordinates in nested key names. What was that?

Well, it worked, so it wasn't stupid.

Anyway, the current format is a tab-separated file alphabetized by the IPA representation of the words, with columns for "short gloss", "long gloss", "part of speech", and "tags". Homophones get multiple entries. Now, it wouldn't be too hard to have a similar set of tables in an array, to more closely match the run-time representation. One of the things I need to consider, with a more flexible format, is whether I want to try fitting derivational tables into the lexicon. The issue there is that it would be harder to make sure the data gets serialized and deserialized properly when applying sound changes, so I'm right now thinking I'd like to consider how the data gets from the file that works as input and output to sound changes, into a nicely-formatted document.

(Let's also consider the possibility that I put all of the gloss changes in a separate file just to get things done with tomorrow, and actually integrate them later.) Anyway, got to sleep on this.

Good night.