Coding 2023-02-10
Well, I didn't come up with the greatest interface for the class-enum thing, but it works for now, and maybe I can tweak it later.
Anyway, let's get into what it was for. It looks like I'm getting into things near the beginning of the tutorial I'm looking at. So, let's see...
Earley grammars can be expressed as a list of rules, which have a left-hand side (a single non-terminal symbol), and a right-hand side (a sequence of terminal and non-terminal symbols). Later on, it should also have stuff like a constructor for the non-terminal given on the left, but let's stick to the recognizer for now. (One obvious attempt at optimization would be to store the rules in a table that indexes them by left-hand side.)
Being very careful about reading the example in the tutorial, we see that there's also "groups of (specifically terminal?) symbols".
Anyway, as we move on through the tutorial, we see that there is a concept of "items", which derive from individual rules. It seems to me most natural to build them up by stages, starting with a rule, and then adding the starting index in the parse, and then the progress through the right-hand side. This is because the natural derivation of an item over time is to advance the progress through the right-hand side, but this has no effect on the starting position.
Next, we get to state sets, which I believe I wrote the otable module for. Here it is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | local m = {} local function non_numeric(k) if type(k) == "number" then error "Indirect index cannot be numeric" end return k end local function get_by_index(t, i) local k = non_numeric(t[i]) return k, t[k] end local function get_by_key(t, k) local v = t[non_numeric(k)] if v then return k, v end end local function o_it(state, control) control = control + 1 local k, v = get_by_index(state, control) if k then return control, k, v end end local function opairs(t) return o_it, t, 0 end m.opairs = opairs function m.get(t, i_or_k) if type(i_or_k) == "number" then return get_by_index(t, i_or_k) end return get_by_key(t, i_or_k) end local function add(t, k, v) if v == nil then return t end local _, v_existing = get_by_key(t, k) if v_existing == nil then table.insert(t, k) t[k] = v elseif v_existing ~= v then error "Cannot change associated value" end return t end m.add = add local function move(src, dest) dest = dest or {} for _, k, v in opairs(src) do add(dest, k, v) end return dest end m.move = move function m.merge(...) local dest = {} for _, src in ipairs {...} do move(src, dest) end return dest end return require "readonly"(m) |
One thing I realize as I look over this code is, I'm going to have to make sure that everything handles __eq, or that I pull some shenanigans to make sure that specific functions produce the exact same values every time.
Another possibility is to just ignore that module for now, and accept some probably-quadratic behavior in the prototype: just implement __eq, and make sure to always scan the target table before inserting.
Okay, I had a proper look over the rest of the tutorial, and realized that I can probably adapt the code from the tutorial into the framework I'm putting together. Here are the big differences:
- Instead of text characters for terminal symbols, I'm going to use the existing token class.
- I'm going to have to decide how to represent non-terminals, since the actual classes being generated won't correspond neatly to the rules. Probably just using strings would make sense. I want to take some time to consider whether there's anything that would strongly recommend an alternative, so I'm going to take some time to think about this.
Anyway, that's enough for right now.
Good night.