So, I tried a few things today, and they mostly worked. I don't feel like going into more detail.
Anyway, let's see what I can do as far as making some progress or plans on the Earley parser.
So, in my understanding of things, the initial state should include some states. To expand on these states:
- We perform any possible completions until no more completions are possible
- We perform any possible predictions, doing newly-allowed completions as needed, until no new predictions are possible
- Once only scans are possible, we scan into the next state
A completion looks like:
- Given an item that has no symbol after the dot
- Find any items in the set indexed by the item's start_index that have the item's left-hand-side after the dot.
- Place the advanced versions of those items in the current set
- This may require an optimization to avoid quadratic behavior on some grammars.
A prediction looks like:
- Given an item that has a non-terminal symbol after the dot
- Add all rules that produce that symbol to the current set
- Mark any newly added rules for processing
- If the newly added rules are eligible for completion, recursively carry out completions until they're done
A scan looks like:
- Given an item that has a terminal symbol after the dot
- If the terminal symbol matches the current token
- Add the current item to the next set, advanced by one
If we maintain multiple stacks/deques/sets/whatever/maybe dicts that we popitem from, and clear them before the scan step, then we can have a uniform setup of "here is a set and sets to process, add it to both, and to the correct set to process if it's new". If I do it like that, then the full prediction step looks like "predict one, complete again".
I should note at this point that it's very likely that I'll have very different ideas about how this should be done once I've prototyped this, because I only sort of understand the documentation that I'm reading.
For now, I'm going to write the helper for this as a nested function inside the method.
Several rewrites and fill-ins later, I have a prototype that is almost certainly not quite correct, and not enough time in the night to feel like testing.
Basically, the core of the logic looks... fine, but I have some assertions after the outer loop that are only sometimes valid, and can probably be expected to not be valid most of of the time. From writing this out, I fixed up the assertions, and now I need some new ones to make sure the data is actually usable.
A few more tweaks later, and I'm about done for the night.
I elected to return a frozen version of the state set from the parse method, so that "find the correct parses" is decoupled from "produce the state set". I'll work on producing the parses next, and come back to optimizing the parse method once I've proved that anything works.
For now, I should wrap up and take the remainder of the night as easy as I can, which is not as easy as last night.