Coding 2025-12-18
Oh boy. After I published the last entry, I realized there's way more that I'm doing wrong. In particular, currently I have a separate lexicon entry for each specific gloss. And I'm applying semantic drift, which changes the glosses. Which is going to get extremely messy once I start generating declensions. I'm not going to make four or five versions of every word in the drift lexicon.
The way that I think makes sense to deal with this is to create a trie-like structure representing derivational morphology.
- Given an uninflected gloss, you have the pronunciations, the long gloss, the part of speech, and the tags.
- Additionally, you have a map of gloss inflections; these can minimally just contain pronunciations.
- However, if we have a derivation that makes it "sort of a new word", then under that inflection, we have the pronunciations, the long gloss (important for clearing up subtleties in the derivation), the part of speech, and the tags, as well as the appropriate derivations for that part of speech.
One obvious question is where to stop, and I think that has to be decided on a case-by-case basis. If a particular derivation is not used much, then it can be omitted from the lexicon proper, and derived via regular processes when needed. One of the processes needed here is to "re-parent" a sub-trie to the top level, if the derived form takes on an independent identity. Thinking about the right way to use these capabilities, my sense is that even one extra level of nesting should be used sparingly, but I'll have an idea of when it's needed. At the same time, the prompts aren't covering these kinds of derivations yet, so it may be a sensible use of my time to only implement enough to handle declension and conjugation. Although, I would like to make sure that I don't somehow make it hard to implement.
Fundamentally what we have here is a structure of nodes and leaves, where nodes and leaves contain data, but different data types. For serializing the data, we can have a field specifically for containing the children, which maps representations of inflections to the required data. I can implement a simple object layout that serializes identically to the more advanced possibilities. So, I've got a plan. What I don't have is any more time tonight. I'm going to wind down now.
Good night.