Three Dollar Quill

Coding 2021-03-10

Wed 10 March 2021

By Max Woerner Chase

I haven't thought too hard about this cruft-related project, but there's something bothering me about my first iteration on the design. Basically, if there are changes that overlap in just the right way, the current design will lose information by taking the changes as a whole, rather than by applying them one by one.

For a simple example, consider an if block:

{% if condition_a %}
    Non-trivial content
{% endif %}

And imagine changing it to:

{% if condition_b %}
    Content with a non-trivial diff
{% endif %}

Now, there's nothing stopping the source from being more elaborate in the conditions, but this is less distracting and was less effort to type. The situation here is that there's a non-trivial diff between the contents of the if blocks, and we'd like that diff to be preserved if both condition_a and condition_b are true. The more branches either version has, the more complicated this can potentially be.

The current design would see the diff inside the condition expression, and just put each version completely separately, but that loses information about the diff when the condition bodies overlap.

This might seem like a bit of an edge case, but I cannot trust edge cases to not become critical functionality. So, I'm trying to figure out how to represent this.

One way I could think about this would be to try to bring the correspondences of the diffs into the higher-level representations. Do it with lexing, and you've got a token stream that sometimes bifurcates and rejoins. Do it with the AST, and I guess the result is some kind of double-tree. I've got this vague idea that maybe I could ignore how Jinja implements... everything, and try to do something with Huffman coding, but, like, not for compression. It's not a well-developed idea currently.

Taking another swing at this... The structure I care about involves the sequences that are the same in each version of the template. In that sense, looking at what changed is kind of the complement to what I want to accomplish. To figure out my priorities and what the implementation should do, I need a concept of "the same" that handles stuff like for loops.

It's kind of daunting to imagine carrying this out for everything that a template could do. Ideally, I'd like to annotate specific sequences within the source and see where they end up. If it were possible to get that data from running Jinja more-or-less as normal, then I could apply established diff algorithms to handle any alignment issues that get introduced (somehow...), and then the diff would be basically ready. I don't know if it's possible to fake out Jinja's internals that extensively, but I'll look into it tomorrow.

Good night.