Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Excuse me, another followup question (can't edit on mobile): can you ELI5 how do exploitation and exploration "emerge" naturally instead of the tradeoff being explicitly coded as in RL?


As a general answer, the theory suggests that organisms maximize a quantity known as model evidence, which is just a way of saying 'how much evidence does some data provide for my model of the world?'

There are two complementary ways to maximize this - change your model or change your world.

If we now grant that actions also maximize model evidence, then actions can either be conducted to sample data that make the model a better fit of the data (exploration), or they can be conducted to sample observations that are consistent with the current model (exploitation).


And the optimization process itself would determine whether updating the model or changing the world is optimal, I guess. Thanks.


The equation for free-energy/ELBO has two terms, an energy and an entropy. You can rewrite it as "log-likelihood minus KL from prior". If you write your model in a certain way, you can then read it as, "Fit to the data, minus cost" (second formulation) or "Accuracy + exploitation + exploration" (first formulation).


In forumations of FEP, there are two terms: cost and ambiguity. Minimisation of this combined term happens in a Bayesian optimal way. So you don't have to explicitly code weights for exploration and exploitation.

Although what you do have to code is prior preferences, and since it is a distribution, you implicitly code the range of those preferences. But once you do that the FEP, algorithm figures out when to collect more data to build a better model and when to use the existing model to get near the prior preferences.


I see. Much more elegant than explicitly coding the trade-off, actually :)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: