Today, I did the match logic for the "target" section of SCA² rules, which is one of the easiest parts, I think.
The logic for handling the replacement section is almost as easy, it just needs some subclasses or protocols or something.
Environment matching is the big hole so far. It'll need some mild special-casing to handle degemination, and I think the main logic will have to operate on a sliding window of trigraphs or something. I might need to just lay out all of the logic imperatively and see how much I can simplify it.
I mean, what I want to do there is convert the matching logic to a nondeterministic finite automaton, then convert that to a deterministic finite automaton, then make sure that runs properly off of a stream of phonemes.
The key weirdness in that is getting the matches right. There's "literal phoneme", "list of phonemes", arbitrary-length sequence of anything, "the phoneme in the input word at the next lowest index", "word break (do not advance the stream)", and "optional sub-sequence".
This might sound like absurd premature optimization, but my intention here is to construct the logic from simple pieces in a principled fashion, rather than trying to enumerate every high-level special case.
Thoughts for how to do some of this:
- Include a "geminated" flag in the iterator implementation so the gemination test doesn't have to know about indices
- Make sure it's possible to move through multiple states in a single iteration step
- Oh, wait, my plan for gemination was to fold it into the category data, since a non-category gemination can be statically converted to a specific phoneme
- Hm. Either one probably works, but having an explicit flag sounds like less work, I think.
We'll see what works in a few days, hopefully.