Excuse me, another followup question (can't edit on mobile): can you ELI5 how do exploitation and exploration "emerge" naturally instead of the tradeoff being explicitly coded as in RL?
As a general answer, the theory suggests that organisms maximize a quantity known as model evidence, which is just a way of saying 'how much evidence does some data provide for my model of the world?'
There are two complementary ways to maximize this - change your model or change your world.
If we now grant that actions also maximize model evidence, then actions can either be conducted to sample data that make the model a better fit of the data (exploration), or they can be conducted to sample observations that are consistent with the current model (exploitation).
The equation for free-energy/ELBO has two terms, an energy and an entropy. You can rewrite it as "log-likelihood minus KL from prior". If you write your model in a certain way, you can then read it as, "Fit to the data, minus cost" (second formulation) or "Accuracy + exploitation + exploration" (first formulation).
In forumations of FEP, there are two terms: cost and ambiguity. Minimisation of this combined term happens in a Bayesian optimal way. So you don't have to explicitly code weights for exploration and exploitation.
Although what you do have to code is prior preferences, and since it is a distribution, you implicitly code the range of those preferences. But once you do that the FEP, algorithm figures out when to collect more data to build a better model and when to use the existing model to get near the prior preferences.