I spent today being sleepy and trying to square some stuff away, so I'm going to try describing some old stuff I alluded to yesterday.
In my efforts to get a handle on neural nets, I rolled my own library for manipulating them. Because I've never really approached optimization in a principled manner, it was pretty slow, no matter what crazy rewrites I did to it. I believe part of the problem was that I had too much going on in the hot loops, basically as a result of including dynamic capabilities that I don't think were necessary in that context.
To be a little more concrete:
- The library consumer first constructs a directed acyclic graph where each node is some kind of mathematical operation.
- A separate visitor class defines functions to evaluate arbitrary nodes within the DAG with respect to an invariant state. Note that, for nodes that don't just retrieve a value from the state, but rely on other nodes, the visitor has to be re-consulted to evaluate those values.
- As such, I now believe the signature of (cls, node, state) is one of the factors holding this code back, because it necessarily pulls a number of attribute lookups and method calls into the hottest loops in the code. When it comes to variation over time, the network structure is constant and the visitor classes should be irrelevant, leaving just the state. Ideally, from there, we'd have some way to bundle up a bunch of training iterations into a single call, in case it's still slow.
Well, I've got all these ideas, but I should probably be collecting statistics on the code to be able to see if they're at all worth it. In any case, the stuff I was talking about in the last bullet point sounds nice and all, but you may wonder how I intend to attempt it.
Gosh, it's late. Good night.