Interim blog setup, rough notes being dumped here at the moment - I'll tidy once I get to a milestone. And sorry about the layout, that'll probably take even longer to get around to... #TODO

Neural Net Nuts and Bolts Speculation

Published on 2023-09-05 by @danja

braindump for future me

Chain of Thought

I've had dataflows on my mind this week, tracing through some code and seeing some coincidentally related material, eg. Tweet from Mark Watson :

Wowza! I asked ChatGPT with Diagrams Plugin to generate a UML sequence diagram for a SPARQL query to DBPedia: A+ grade

The other day I had a skim through some AI books I got in the 1990s, mostly curious what ideas were around then that have been forgotten, could maybe be revitalized. One notable takeaway was how ReLU (rectified linear unit, a diode) has since unseated tanh/sigmoid as the activation function of choice.

Seed of an Idea

Looking at a vanilla neural network, single numeric values flow through, getting modified along the way (and are backpropagated, changing weights). For tracking flows and highlighting things, medical diagnosis can use radioactive tracers.

Could something comparable be used with NNs?

At the input layer a value x is given to a node, subsequently each node receives a bunch of values from connected nodes in the previous layer. What if instead a pair of values were passed (x, C), where C is a marker constant, what should come out of a node, what about side effects?

First pass, how about this : the treatment of the x values stays exactly the same as in the vanilla case - but a C is hitching a ride. A selection function at the node picks the C from the input pairs with the highest value from its inputs. This is the value passed along from this node to nodes in the next layer.

The side effect I have in mind is similar to the way weights are adjusted in backprop, that the node takes on a value of C. This could also occur on the backprop phase, so each node holds (Cf, Cb).

Are there any implementation issues I haven't seen? Might this be any use for anything?

To investigate, I guess trying it in toy network code would be the next step.

There's a kind of bigger picture/generalization aspect to this. What if the values passed around, rather than usual single values or strange pairs as above, are arbitrary data structures? The transfer funtions are arbitrary? I'm pretty sure there'll be net designs which pass matrices of real numbers around, I've a feeling there might be calculation performance optimization potential somewhere around there. But I haven't a clue what activation functions would be appropriate...

On the function question, usually differentiability is canon. But in a view from 1km, this is a special case where optimization is done through convex optimization, hill-climbing over the number spaces. Other optimization techniques exist, eg. combinatorial, integer programming.

I've not read much about optimization techniques, apart from bits in papers along the lines of "We found 5 layers gave accuracy within 2%, 6 or more only gave fractional improvement.". The relative benefits of different activation functions was looked at a lot in the early days of Deep Learning. But nowadays experiments (I've seen) tend to be looking more at large-scale topologies, with the subunits chosen from known good black boxes (from a box of black boxes?).

I don't know, but perhaps the space of possible network functionality could be explored with a bit of meta-programming, trying different setups as above. It feels like it should be automatable.

To borrow from the Outer Limits :

There is nothing wrong with your neural network. We will control the datatypes. We will control the activation functions.