Coding 2022-01-06

By Max Woerner Chase

I'm going to sketch out some thoughts on implementing an Earley parser. The tutorial I'm reading is talking in terms of loops, but I'm trying to think about this in terms of reducing a stream of tokens. I'm pretty sure this will work, and it means I can punt on figuring out the outer loop for a bit.

Here's the basic idea: I have a series of rules, and that's bundled up in an object that defines a method like "given a state and a token, return a new state". Then I can just generate the initial state, functools.reduce, and bam, done. ... Probably.

So, because this is an Earley parser, the state is going to be something like "a list of state sets", where a state set is "a set of Earley items", where an Earley item is "a rule, the index of when it started in the state, and partial completion data". For a recogniser, the completion data is just how far you've read, but if you want to parse, I assume it needs to accumulate the previously-parsed items instead. I should... probably read further in the tutorial, I think.

All right, I read over it, and here's where my understanding is:

So, basically, I want to say:

Ooh, I've got an idea that could be really good or really weird. Suppose we have some kind of stream-recording data structure, so we can then make our functions operate on streams. If we organize the generation of items according to Jeffrey Kegler's open letter (scans, completions, predictions), then I think it should be possible to say "okay, we can view a set as consuming one iterable and producing two, by taking an initial iterator from the previous set and expanding on it, and also producing an iterator of scans for the next set". By interleaving the order of iteration, it "should" be possible to make sure that all possible parses are reached, even if some rules recurse indefinitely (which, why would you do that, but whatever). (Although, if there's an indefinitely recursive rule and a syntax error, then this idea would presumably keep on trying the same stream of nullable rules in the hope that that will somehow repair the parse downstream... So it's almost like, indefinitely recursive rules should trigger once-ish for the initial parse pass, and then they can kind of do whatever when it comes time to perform actions on a tree.)

In any case, I'm going to need to do a bunch of sketching of this stuff on paper, because these concepts don't all fit in my head.

And, I've got some other stuff I want to take care of right now, so I'm going to wrap this up here.

Good night.