Leer in with an initial skepticism: the jump from trad RNNs to attention-only was not — could not be — that dramatic, right? A tired argument, I know, but I'm setting the stage.
Continue on with a more formal (but still limited) criticism: induction does not produce knowledge. The pig that gets fed every day thinks nothing of the axe replacing the pail. Ninety-eight, 99, 100 (°C) and yet the temperature in the kettle does not continue.
It's the persistence and rigidity of a guess that's the measure of knowledge. No matter how sophisticated the fit of the function, it's the data outside of the sample that matters, right?
There's a lot of low hanging fruit embedded in the domain of all the data there is (±). Doesn't it become harder to withhold verification and test when you're training on everything? What's left after you've eaten the full fridge? Four parameters gets you an elephant and a ~trillion gets you the world's greatest chatbot, but can either cure Alzheimer's or tell me who lives on Mt. Athos?
And so, if that (and e.g. addition with arbitrary precision) is still out of reach, then I doubt Gwern is strictly right, for now. But the longer I follow the rabbit, the more I wonder: what is going to be possible, sooner than later? What is at stake? If this isn't the thing, surely it's part of the thing. It isn't one road to Oz, the rest to the lemming cliff. It's Atlantis at every turn, doomed or otherwise.
Then I read Wolfram's hunch, nodding along: that there are laws of knowledge being stumbled upon, a meta-structure of language itself, parroted outside of its typical framing. Hofstader's Aunt Hillary. Symbol vs signal. And so if this moment isn't "it", it could be the microscope that reveals "it" and its components.
Surely that is vital and awesome and dangerous. Surely one must work on it. Adventure, duty, etc. But mostly primal curiosity, tempered by discipline, justified by said higher virtues. Easy does it, ace...
So, that's it. Continue on. Progress has just begun. The more I read and listen, the more I realize I'm way behind the cavalry, but I'm also not chasing the last bus out of town.
If you're still interested, you can follow my precise progress here: github.com/georgi0u/ml_experiments. Rudimentary, but I've built out the basics. Worked through the loss functions and their derivatives. Got my MNIST and Shakespeare merit badges. Onwards.
And, in the meantime, I'm still already an engineer with a lot of stubborn grit and a decent amount of experience — scaling services, building interfaces, designing APIs...
I can be useful. Best of luck to all.