Coding 2021-03-11
I now know more about how Jinja works. It's not helping with the implementation yet. Basically, Jinja templates are converted to Python source code and compiled to functions. This means that I can't do some kind of side-channel tagging, because any metadata will be discarded by the time it does the compilation. So, if I want to have subsequence metadata, I have to make it explicit. And I probably want to include the actual data to make sure I don't lose it. At the very least, this is necessary for tagging expressions. But I want to make sure that the generated text can't masquerade as metadata. So there needs to be some unambiguous way of distinguishing them. Escaping, or perhaps length-prefixing the data. I'm a little worried about stuff like call blocks, insofar as I don't want the template expansion machinery to see the metadata.
But all of this does suggest a way forward to me: Define a macro that takes a "tag" argument. Wrap calls to the macro around each identifiable output text in each version, tagging the calls with a "sequence id". In the macro itself, output metadata and data. In applying the macro, do not go inside expressions or other call statements. This does mean that some precision will be lost. This will be the biggest problem when dealing with call blocks that take large blocks of text and pass them back verbatim, because that represents a loss of data that can be reasoned about.
My inclination currently is to wonder how much this case will be hit. Searching github reveals that two cookiecutter repos use call blocks, over a total of 18 files. This is few enough results that there's a chance the numbers are being swamped by miscounting, but I'm just trying to get a ballpark figure. So, 18 cookiecutter files with call blocks, vs ~140,000 cookiecutter files. (I dropped 10,000 files from consideration because some of them are config files for cookiecutter itself, I think.) Anyway, at 0.01% of all cookiecutter files, I am fine with considering emitting a warning, but not actually bothering.
So, now I've got a plan for coordinating the metadata, and the next big step is to figure out how to inject it into the templates. Next step after that, derive the metadata from diff information. This is feeling good. I'll probably get serious about planning this stuff over the weekend. For now, I should wrap up.
Good night.