Coding 2023-03-19
Okay, so, I'm writing this late, so this is kind of "Hey, I should work on this more tomorrow," but...
I'm thinking more about the Parametric rework in MOTR, and the more I think about it, the more I realize that I need to nail down some of the terminology.
I'm right now leaning towards four different kinds of data that can go into a command:
- Arguments, which have an optional prefix. These have values like --fail-under=100, --show-contexts, -d reports/coverage.
- Non-path environment variables. This isn't a great name. It refers to stuff like {"TERM": "xterm"}.
- Path environment variables. As above. This is the newest concept in this space, and refers to stuff like {"PYTHONPATH": "/a/b/c:/d/e/f"}, in which the code, not yet written, should synthesize the combined path list from the component paths.
- Implicit input and output. Basically, sometimes a command's input or output is not present in the command's text, but MOTR still needs to be able to reason about it. This is stuff like "the command accepts an output directory, but from downstream commands' perspectives, the outcome is that now the file index.html in that directory exists".
These are all collected into a ParametricCommand, and the usual case for ParametricCommand is that every combination of parameter values corresponds to a single command. However, this is not the case for commands that are supposed to aggregate data from several different versions of some other command.
In order to convert the relevant Parametric values in this situation, the values have to undergo a process currently called "reduction". This selectively overrides the default behavior of combining Parametric values. Now, some variables, instead of duplicating and altering the command invocations, are combined within a single invocation. The way combination is accomplished depends on the type of value.
Thinking about this, we can get the following vocabulary concepts:
- reduction makes it possible for a Parametric to produce values for some subset of its selections, instead of the full set
- combination is the process that allows reduction to occur
- some selections must be singleton for various reasons, including in order to implement combination by, um, not combining anything
- box labels are all of the labels that must be passed to a Parametric to instantiate its values
- selection labels are a subset of box labels, and something will iterate over them at some point
- iterated labels are a subset of selection labels: these labels contribute a dimension to the final output matrix
The overall desired behavior for Parametric labels is found by looking at the ranges of behavior for:
- arguments, non-path environment variables, path environment variables, implicit IO
- static data, inputs, outputs
- multiadic maps, reductions
There are two main things here that relatively less obvious:
- What controls which labels need to be singleton, and what's the right way to represent this?
- What are all of the requirements around implicit IO?
The trick to implicit IO is that it doesn't distinguish the command that is actually run, so any selection label used by implicit IO either must be singleton, or must be an iteration label. Otherwise, the same command will get added to the compendium multiple times with different edge data, which is
bad.
It's like, there's something that will collide with itself unless separated by something else. I don't think ghost labels is a great name for this concept, but it's better than the weird patchwork I have currently.
So, I think the last thing I need to get parity with what the current system models is something that can handle maps over output values. The thing about output values is that any label that any label that an output value doesn't iterate over can't be a non-singleton iterated label.
I'm not happy with iteration/iterated/whatever yet, but I think this all provides a basis to work with. I'm going to wrap up for tonight.
Good night.