Coding 2023-07-07
All right, let's see what it takes to handle Earley parsing, in terms of types.
type 't terminal_matcher = { match_ : 't -> bool ; produce : () -> 't }
Might change the input to that second function. And I think it should not be required that the produce function only produces values that satisfy the predicate.
Anyway.
type n; module NMap = Map.Make(struct type t = n; let compare = compare end); type ('t, 'o) grammar = n * (((n, 't terminal_matcher) Either.t list * (('o, 't) Either.t list -> 'o)) NMap.t)
Without the language tools, I don't know which parentheses are needed, or even if I got that all right. So, let's see if I can explain all of that.
With the terminal_matcher, I'm maybe kind of bending the rules for "what is a grammar", because the usual description is in terms of terminal symbols, which are more-or-less associated with a particular field on the token type in this context. So, the "sensible" thing in my mind is to have a type that relates to the token type, and write helpers that express "match this terminal symbol" in terms of what structure the token has.
Now, to handle the structure of the grammar, I need to have the nonterminal symbol type as a module-level type instead of a variable, so I can use it to instantiate a Map. This map is used to hold the majority of the information in the grammar, with the remainder being the initial symbol next to it. Within the grammar, each value is a pair of a list of nonterminals and terminal matchers, along with the semantic actions, which are functions that combine lists of their output type and terminal symbols.
The expected definition of the action functions is a little dynamic, because I couldn't see a way to encode all of the type information in an existential type so that it doesn't interfere with the ability to put all of these different signatures into one type of slot.
Oh yeah, I defined a record type to hold two functions for the matcher, instead of just aliasing to one of the types, because I'm trying to plan ahead to having a functor that replaces n with n option and provides the ability to convert from the wrapped module type to the wrapped type for this module, but with extra output stuff.
A missing piece of the puzzle is that the grammar modules need to define a parse function that takes a grammar and a list (or something) of tokens, and the error-correcting version almost certainly needs a custom parse function to try to avoid all of the pathological parse behavior that I'm pretty sure adding the error correcting would introduce, at least the way I'm thinking of doing it. (Other people may be built different.)
I think I should be done for now, and think about this more over the weekend.
Good night.