Three Dollar Quill

Coding 2023-07-06

Tags:

Thu 06 July 2023

By Max Woerner Chase

Okay, I'm trying to move on to parsing now, and I need some time to figure out the right way to handle it.

I'm still on the idea I had previously of "let's learn something besides recursive descent", in part because I think trying that in OCaml would lead to an utterly monstrous let rec that would only get bigger as I go.

So, I'm back looking at Earley parsers, and I think I've got an idea of how to put the types together. However, I want to consider error correction possibilities. From doing some research, I think I want to do something like this:

First, have a grammar definition, including semantic actions.

Then, the following:

A transformation of the semantic actions to a form that includes an error score.
New rules for matching "junk" where each token matched increases the error score.
Terminal matchers need the ability to create "synthetic" tokens in the absence of proper input, which signal somehow that they're not present in the source.
Create rules in which the terminal matchers are replaced with matching zero or more junk, and their argument to the semantic action creates the synthetic token.
The junk rules need to create something that the modified matches know to discard.
Some combinations of this stuff.
Probably the additional data should include error details so those can get reported.
Bias advancing in the parse without errors (maybe using effects to create continuations or something?) so that working code doesn't end up generating a massive error forest just in case.

Like prioritize by number of errors, and then by index.

Okay, this sounds workable, but I'm not sure. I'm going to have to sleep on it.

Good night.