There is also a distinction in the brain between long-term and short-term memori...

There is also a distinction in the brain between long-term and short-term memories. It is not so beyond belief that the brain stores short-term memories into long-term memories in a separate process (Perhaps sleep, whose lack thereof we know is linked with memory recall issues).

Recurrent neural networks 'learn' continuously by changing their weights (In effect) based upon the 'previous state'. There are papers showing that attention mechanisms in transformers basically provide a 'weight' update function so that models can 'learn' to accomplish a task based on a few examples. In other words, transformer networks 'learn' to train themselves based on the examples given. It is not so beyond beliefs that recurrent neural networks do the same thing. They learn to set up their internal states such that future tasks can be affected by previous input and patterns. In fact... playing around with models like RWKV, you soon learn that this is a fundamental part of the model. If you start talking to it in a certain way, it will start echoing that back. Clearly it's 'learning'.