Minimizing surprise
Active Inference is a normative framework to characterize Bayes-optimal behavior and cognition in living organisms. Its normative character is evinced in the idea that all facets of behavior and cognition in living organisms follow a unique imperative: minimizing the surprise of their sensory observations. Surprise has to be interpreted in a technical sense: it measures how much an agent’s current sensory observations differ from its preferred sensory observations—that is, those that preserve its integrity (e.g., for a fish, being in the water).
[1, p.6]
In an earlier post about perceptual inference we introduced the generative model \(p(o, s) = p(o \mid s)p(s)\) used to infer mental state distributions from observations, \(o\), and expected mental states as indicated by high probabilities in the “prior” mental state distribution, \(p(s)\). We also hinted that at least complex organisms have another generative model containing another prior \(\tilde p(s)\) that represents the desired mental states. The higher the probability, the more desired the mental state. A desired mental state corresponds to a desired observation.
Desires can in AIF be expressed both in terms of desired mental states and desired observations. If the generative model is in some sense faithful to the generative process, a desired mental state corresponds to a desired real-world state and a desired observation1. When I below write “desired mental states” it should be read “desired mental states or desired observations”.
The ultimate objective of the organism is, according to AIF, to minimize surprise over time. Surprise measures the discrepancy between the organisms desired observation and its actual observation at some level of processing in the brain. Surprise indicates that the organism is outside of its comfort zone and probably wants to take some action to get back to a desired state of being by getting back to a desired mental state meaning back to observing desired observations. While an organism may state its desires in terms of mental states, the discrepancy between the current situation and the desired situation is usually in AIF quantified in terms of observations. There is in the model no clear line between an observation and a state as the brain creates a hierarchy of representations of input observations. This type of hierarchy has been fairly well mapped in for instance the visual system. The higher levels in this hiearchy are more “state-like” than the lower levels.
In the sensory processing higher-order brain regions constantly attempt to predict the input they receive from lower levels. When predictions are accurate, feedback “silences” these inputs. When there is a mismatch, an error signal is generated, which propagates upwards to adjust predictions. This mechanism is called predictive coding [8].
The organism wants to move through life with as few surprises as possible, basically taking the “path of least resistance” with resistance defined as surprise. It is a useful baseline approximation to think of an organism’s life as an attempt to minimize the integral of surprise over its lifetime. This is similar to the way nature minimizes the integral of the difference between kinetic energy and potential energy over time obeying the principle of least action [6].
Simple organisms that live in predictable environments usually only need to, or can, take one of a few reflex actions to get back to a desired mental state. The only option a beached fish has is to try to flap and hope the flapping will take it back to the water. Bacteria follow the nutrient gradient. Simple organisms may not have the cognitive capacity to hold explicit, long-term desires, only expectations for the immediate future 2.
Intelligent animals, most notably (at least some) humans, can do more than flap or follow the nutrient gradient. They can imagine a large repertoire of possible actions for getting to a desired mental state or desired observations.
Surprise is quantified as the negative logarithm of the probability of an observation according to the generative model, \(-\log p(o)\). As explained in an earlier post, surprise cannot in the general case be calculated analytically. Instead the organism uses a quantity called variational free energy (VFE) as a proxy (upper limit) for surprise in perceptual inference and expected free energy (EFE) as a proxy for (future) surprise in action inference 3.
Variational free energy was introduced in an earlier post. We will dig deeper into both variational free energy and expected free energy in this post.
Free energy is thus according to AIF the loss function that the organism seeks to minimize at all times to arrive at the mental state most consistent with the observations and to stay within its homeostatic and allostatic states.
Now and in the future
To decide what to do next, the organism evaluates different sequences of actions, policies, to determine which actions would most likely lead to a series of desired observations or desired mental states over a planning horizon, minimizing free energy and thus surprise. It selects a policy that leads to a low surprise. As a basis for the policy evaluation the organism uses it’s past statistical experience of what actions usually lead to what observations. Depending on the complexity of the organism and perhaps its sensing range [7], the policy may be anything from moving with the nutrient gradient to a plan for a professional career.
If the policy space is continuous (like when negotiating a single track on a mountain bike) the brain can do the evaluation using gradient search in the policy space. The loss function for the gradient search would be the expected free energy. If there are only a few discrete policies to choose from, like turning left of turning right i a T-intersection, the brain may instead of gradient search run a mental simulation of each policy to find the one that minimizes EFE and thus future probable surprise. Exactly what strategies the human brain uses for minimizing EFE of future actions is not known. I will here use the simulation approach as an example because I find it rather intuitive. Also, in many cases we don’t have many policies to choose from so a simulation might be the most feasible approach.
In a stable world the desired observations should be same as the expected observations. The future body temperature should be in line with the historical body temperature, give or take. The fish both expects and desires to be in water rather than on land (or at least it wants to believe it is in water). If this is the case, then one and the same generative model can guide both perceptual inference and the “simulation” in action inference. This means that the priors \(p(s)\) can in stable situations represent both expected mental states and desired mental states (from which expected and desired observations can be inferred). If this is not the case, then the organism must hold two different generative models in its “head”: one for perceptual inference and one for action inference (planning actions). Intuitively this seems reasonable at least for humans as we often want to be in a state that is different from the state we expect to be in (sometimes to the detriment of our mental health).
Planning for the future
I predict myself therefore I am.
Anil Seth. Being You.
A sequence of future actions is in AIF, a policy, is denoted \(\pi\)4. \(\pi = [a_0, a_1, \ldots a_{T-1}] = a_{0:T-1}\). The purpose of each action is to transition the organism to a new observation and mental state (and therefore, if all goes well, to a new real-world state).
Each policy will lead to a unique sequence of observations \(o_{1:T}\) and associated mental states \(s_{1:T}\). We will in the following for convenience and brevity skip the subscripts in most equations and use the notations \(a\), \(o\), and \(s\) to also represent sequences. As mentioned above, the organism is in AIF assumed to have information about the probabilistic relationship between actions and the mental states and observations caused by the actions.
The purpose of action inference is to, when action needs to be taken, infer the policy \(\hat \pi\), that minimizes free energy in the future.
Generative model including action
In preparation for understanding action inference, we start with making two enhancements to the generative model for perceptual inference introduced in this post.
First, we don’t just infer the state distribution at the current moment in time (\(t = 0\)) but estimate a the distribution of the whole sequence of states leading up to \(t = 0\), \(s_{-T+1:0}\), based on the corresponding sequence of observations \(o_{-T+1:0}\).
We will also include in the model the actions that cause the organism to go from one mental state to the next, \(a_{-T:-1}\). Note that actions are shifted one step in time; actions as assumed to come before the observations caused by the actions. The sequence of actions is the policy, \(\pi\). Again, we drop the subscripts for brevity below.
The above additions mean that the \(p(o, s)\) of the model becomes \(p(o, s, \pi)\).
For any probability distribution \(p(o, s, \pi)\), it is true that:
$$p(o) = \sum_{s, \pi} p(o, s, \pi)$$
This is the marginal probability distribution of \(o\). Expressed in terms of surprise this becomes:
$$- \log p(o) = – \log \sum_{s, \pi} p(o, s, \pi) = – \log \mathbb E_{q(s, \pi)} \left[\frac{p(o, s, \pi)}{q(s, \pi)}\right] \ \ \ \ (1)$$
In the last expression we have multiplied the expression in both the nominator and the denominator with the probability distribution \(q(s, \pi)\) which here represents the variational posterior of perceptual inference. (The expression would of course be true for any well-behaved probability distribution.)
Jensen’s inequality gives:
$$- \log p(o) = – \log \mathbb E_{q(s, \pi)} \left[\frac{p(o, s, \pi)}{q(s, \pi)}\right] \leq – \mathbb E_{q(s, \pi)} \left[\log \frac{p(o, s, \pi)}{q(s, \pi)}\right] = \mathcal F[q; o]$$
The surprise \(– \log p(o)\) is thus always smaller than \(\mathcal F\) which means that minimizing \(\mathcal F\) is a way to (approximately) minimize the surprise.
The right hand side of the inequality can be written:
$$\mathcal F[q; o] = \mathbb{E}_{q(s, \pi)} [\log q(s, \pi) – \log p(o, s, \pi)] \ \ \ \ (2)$$
\(\mathcal F\) is the variational free energy at time \(0\), introduced in an earlier post and further explained here, but now based not only on the current observation but on a sequence of observations and a sequence of actions.
Formally \(\mathcal F\) is a functional (function of a function) of the variational distribution \(q\), parametrized by \(o\). The variational distribution is the “variable” that is modified to find the minimum value of \(\mathcal F\) in perceptual inference.
For a certain policy \(\pi\) \((2)\) yields:
$$\mathcal F[q; o, \pi] = \mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o, s \mid \pi)] = \ \ \ \ (3a)$$
$$\mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o \mid s) – \log p(s \mid \pi)]= \ \ \ \ (3b)$$
$$\sum_s q(s \mid \pi) \log q(s \mid \pi) – \sum_s q(s \mid \pi) \log p(o \mid s) – \sum_s q(s \mid \pi) \log p(s \mid \pi) =$$
$$D_{KL}[q(s \mid \pi) \mid \mid p(s \mid \pi)] – \mathbb E_{q(s \mid \pi)}[p(o \mid s)] \ \ \ \ (3c)$$
Above we have taken into account that the likelihood \(p(o \mid s)\) does not depend on \(\pi\). It describes the sensory system and is assumed to stay the same across policies. Expression \((3c)\) above corresponds to equation \((3b)\) in this post augmented with policy and and with all variables interpreted as time sequences (the time indices are not shown above).
Another way to slice variational free energy is (see also this post):
$$\mathcal F[q; o, \pi] = \mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o, s \mid \pi)] =$$
$$\mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(s \mid o, \pi) – \log p(o \mid \pi)] =$$
$$\sum_s q(s \mid \pi) \log q(s \mid \pi) – \sum_s q(s \mid \pi) \log p(s \mid o, \pi) – \sum_s q(s \mid \pi) \log p(o \mid \pi) =$$
$$D_{KL}[q(s \mid \pi) \mid \mid p(s \mid o, \pi)] – \log p(o \mid \pi)$$
This expression corresponds to expression \((2b)\) of this post augmented with policy and with all variables interpreted as time sequences.
The dual purpose of free energy
Let’s pause for a minute to remind ourselves of what kind of work the expression for free energy does:
$$\mathcal F[q; o, \pi] = D_{KL}[q(s \mid \pi) \mid \mid p(s \mid o, \pi)] – \log p(o \mid \pi)$$
The basic tenet of AIF is that the organism navigates the world by continuously trying to minimize free energy. Free energy consists of two rather unrelated terms, one for optimizing perception and one for evaluating the valence of the current and future observations and associated mental states:
The first term, \(D_{KL}[q(s \mid \pi) \mid \mid p(s \mid o, \pi)]\), quantifies how good the organism’s approximative posterior distribution of mental states \(q(s \mid \pi)\) is, i.e., how close the approximative distribution is to the true distribution \(p(s \mid o, \pi)\). This term tells us how accurate the organism’s interpretation of its observation is. This term can be minimized using variational inference introduced in an earlier post.
The second term is the surprise, \(– \log p(o \mid \pi)\). It quantifies how expected or desired the observation (and thus its associated mental state) is. A high surprise is usually a signal to do something about the situation. If you are a fish on land you are surprised and eager to get back into water. An organism always strives to choose its actions so as to minimize current and expected future surprise. AIF posits that the organism’s expected and desired observations are encoded in its generative models.
As hinted above, the organism may need two different models, one for what it expects and one for what it desires because they are not always the same. In both cases the expectation or desire is encoded in the probability of the observation or mental state. Highly expected or desired observations and states have high probabilities in the model.
Into the lake
In this enhanced model the prior distribution of mental states in the current moment also depends on the policy that was executed leading up to the current moment.
Let’s speculate how the model might have worked in my own brain during an incident a few years ago. One evening I decided to go ice skating on the local lake. The ice was thick and smooth, perfect for long distance skating, a popular activity in my native Sweden. The moon was shining and I picked up a good speed with a bit of tailwind.
I knew I needed to avoid the mouth of the river flowing into the lake because the ice is thin where the water flows. It turned out that the thin ice extended much further from the river mouth into the lake than I thought and suddenly I found myself in a hole in the ice, swimming in sorbet. It took a second or two for my brain to accept the mental state corresponding to the fact that I was swimming in freezing water. My first reaction was: this is not happening because my generative model had assigned a very low probability to all possible observations from a hole in the ice! Then my brain started to minimize free energy (while spending some physical energy to boost my body heat).
There were several observations pointing at me actually being in water: I was obviously wet, I started to feel the cold, I felt pain in my sartorius muscle which had hit the edge of the ice on the other side of the hole at high speed, and I was definitely swimming. Add to that the recent actions: I had been skating on ice, fast, not very far from a river mouth. Whirr, click! After a second or two of free energy minimization I had a posterior mental state distribution peaking sharply for the mental state “in a hole in the ice”. The remaining surprise was, after the variational inference was done and the mental state was established, still very high because of the low prior probability of the observation and mental state.
My desired mental state was of course not to be swimming with my skates on in a freezing lake but to be back on the ice. I started to evaluate some policies with respect to expected free energy. More on that below.
Into the future
Let’s pause in the present for a little bit more before we head into the future. The distribution \(p(o, s, \pi)\) is the generative model that is valid up until the present moment. It’s a historic record of what has happened in the past; it is our experience base. It tells us what actions typically led to what observations and mental states. The marginal distribution \(p(s) = \sum_{o, \pi} p(o, s, \pi)\), the prior distribution, holds the probabilities of the expected mental states.
When planning future actions we replace our “expectation model” \(p(o, s, \pi)\) with a “desire model” \(\tilde p(o, s, \pi)\). \(\tilde p(s) = \sum_{o, \pi} \tilde p(o, s, \pi)\) yields the probabilities of the desired mental states. Again, the desirability of a mental state is encoded in the probability of that mental state in \(\tilde p(s)\).
The \(o\), \(s\), and \(\pi\) of \(\tilde p(o, s, \pi)\) are all in the future. When the organism desires something different from what it expects, then \(\tilde p(s) \neq p(s)\).
Action inference, described in detail below, yields a policy \(\hat \pi\) that, when executed takes us to the desired mental states that have high probabilities in \(\tilde p(s)\).
When simulating the future we have to replace actual observations with a probability distribution of observations which is conditioned on the policy. This means that equation \((2)\) is transformed into an expectation over both \(s\) and \(o\). Equation \((3a)\), the “retrospective” variational free energy:
$$ \mathcal F[q; o, \pi] = \mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o, s \mid \pi)]$$
is replaced with the “prospective” expected free energy :5
$$\mathcal G[q; \pi] = \mathbb{E}_{q(s, o \mid \pi)}\left[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi) \right]$$
Shapes of \(\mathcal G[q; \pi]\)
To find the optimal policy \(\hat \pi\) we thus have to find the expected free energy \(\mathcal G[q; \pi]\) for each candidate policy and pick the policy that produces the lowest expected free energy. As mentioned above, this can be done with e.g., gradient search for continous policy spaces and with discrete simulations for discrete policy spaces.
Desires in terms of mental states
This section outlines a derivation of \(\mathcal G[q; \pi]\), the expected free energy for a certain policy, when the organism’s desires are expressed in terms of desired mental states. (We will below look at the case when the desires are expressed in terms of desired observations.) From the derivation above we have:
$$\mathcal G[q; \pi] = \mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi)] =$$
$$\mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log p(o \mid s, \pi) – \log \tilde p(s \mid \pi)] \ \ \ \ (6)$$
\(\log p(o \mid s, \pi)\), again, characterizes the organism’s sensory system and does therefore not depend on \(\pi\) which can thus be omitted. The sensory system is also assumed to be invariant over the course of the planned actions. These two assertions imply that \(\tilde p(o, s \mid \pi) = p(o \mid s) \tilde p(s \mid \pi)\)
\(\tilde p(s \mid \pi)\) represents the organism’s desired mental states. This organism’s desires are assumed to be independent of the chosen policy \(\pi\) so we can set \(\tilde p(s \mid \pi) = \tilde p(s)\). This means that \(\tilde p(o, s \mid \pi) = p(o \mid s) \tilde p(s)\) [3, p.453].
\(q(o, s \mid \pi) = p(o \mid s)q(s \mid \pi)\) where \(q(s \mid \pi)\) represents the probabilities the organism assigns to attaining specific mental states given specific actions. Threse probabilities are estimated based on experience and reasoning. If the organism jumps into water (action), then the organism is fairly certain to experience a mental state including wetness for instance.
\((6)\) now becomes:
$$\mathcal G[q; \pi] =\mathbb E_{p(o \mid s)q(s \mid \pi)}[\log q(s \mid \pi) – \log p(o \mid s) – \log \tilde p(s)] \ \ \ \ (7)$$
Note that the expression for \(\mathcal G[q; \pi] (7)\) is similar to the expression for \(\mathcal F[q; o, \pi] (3)\). The difference lies in what is known and what is sought in the respective cases:
In \((7)\) everything on the right hand side is known (mathematically speaking). The organisms seeks the value of \(\mathcal G[q; \pi]\) for each \(\pi\) that it can think of to construct \(\hat q(\pi) =\sigma (- \mathcal G[q; \pi])\). It then selects and executes a high probability policy, one that is likely to yield a low free energy.
In \((3)\) the distribution of mental states \(q(s \mid \pi)\) is the unknown and is estimated using e.g., variational inference. The variational free energy serves as a loss function for this variational inference and will be implicitly minimized when the variational distribution \(q(s \mid \pi)\) is as close as it gets to the true posterior distribution of mental states \(p(s \mid \pi)\).
Equation \((7)\) can now be rewritten:
$$\mathcal G[q; \pi] = \sum_{o, s} p(o \mid s)q(s \mid \pi)[\log q(s \mid \pi) – \log p(o \mid s) – \log \tilde p(s)] =$$
$$\sum_s \left(\sum_o p(o \mid s)q(s \mid \pi)[\log q(s \mid \pi) – \log \tilde p(s)] – \sum_o q(s \mid \pi)p(o \mid s)\log p(o \mid s)\right) =$$
$$\sum_s \left(q(s \mid \pi)[\log q(s \mid \pi) – \log \tilde p(s)]\sum_o p(o \mid s) – q(s \mid \pi)\sum_o p(o \mid s)\log p(o \mid s)\right)$$
$$\sum_o p(o \mid s) = 1 \Rightarrow$$
$$\mathcal G[q; \pi] = \sum_s \left(q(s \mid \pi)[\log q(s \mid \pi) – \log \tilde p(s)] – q(s \mid \pi)\sum_o p(o \mid s)\log p(o \mid s)\right) =$$
$$D_{KL}[q(s \mid \pi) \mid \mid \tilde p(s)] + \mathbb E_{q(s \mid \pi)}[\mathbb H[p(o \mid s)]] \ \ \ \ (8)$$
The first term indicates how close the distribution of states given by the evaluated policy is to the desired distribution of states. The second term represents the expected surprise associated with the evaluated policy. The organism wants to both find a policy that takes it as close to its desired states as possible and to minimize the associated surprise.
Desires in terms of observations
This section outlines a derivation of \(\mathcal G[q; \pi]\) when the organism’s desires are expressed in terms of desired observations.
$$\mathcal G[q; \pi] = \mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi)] =$$
$$\mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log p(s \mid o, \pi) – \log \tilde p(o)] =$$
$$\sum_{o, s} q(o, s \mid \pi)[\log q(s \mid \pi) – \log p(s \mid o, \pi) – \log \tilde p(o)]$$
Bayes’ theorem gives:
$$p(s \mid o, \pi) = \frac{p(o \mid s, \pi)p(s \mid \pi)}{p(o \mid \pi)}$$
Also, as stated above, the likelihood \(p(o \mid s)\) doesn’t depend on \(\pi\). We get:
$$p(s \mid o, \pi) = \frac{p(o \mid s)p(s \mid \pi)}{p(o \mid \pi)}$$
$$\mathcal G[q; \pi] = \sum_{o, s} q(o, s \mid \pi)[\log q(o \mid \pi) – \log p(o \mid s) – \log \tilde p(o)] =$$
$$\sum_{o, s} q(o, s \mid \pi)[\log q(o \mid \pi) – \log \tilde p(o)] – \sum_{o, s} q(o, s \mid \pi)[\log p(o \mid s)]$$
$$ q(o, s \mid \pi) = q(o \mid \pi)p(s \mid o, \pi) \Rightarrow$$
$$\mathcal G[q; \pi] = \sum_o \left(\sum_s p(s \mid o, \pi)q(o \mid \pi) [\log q(o \mid \pi) – \log \tilde p(o)]\right) -$$
$$\sum_s \left(\sum_o p(o \mid s)q(s \mid \pi) \log p(o \mid s)\right) =$$
$$\sum_o \left(q(o \mid \pi) [\log q(o \mid \pi) – \log \tilde p(o)] \sum_s p(s \mid o, \pi)\right) – \sum_s q(s \mid \pi) \left(\sum_o p(o \mid s) \log p(o \mid s)\right)$$
$$ \sum_s p(s \mid o, \pi) = 1 \Rightarrow$$
$$\mathcal G[q; \pi] = D_{KL}[q(o \mid \pi) \mid \mid \tilde p(o)] + \mathbb E_{q(s \mid \pi)} \mathbb H[p(o \mid s)] \ \ \ \ (9)$$
Note that the similarity of equations \((8)\) and \((9)\). The two expressions don’t necessarily yield exactly the same answer as they start from different premises. It can in fact be proven that [1, p.251]:
$$D_{KL}[q(s \mid \pi) \mid \mid \tilde p(s)] + \mathbb E_{q(s \mid \pi)}[\mathbb H[p(o \mid s)]] \geq$$
$$D_{KL}[q(o \mid \pi) \mid \mid \tilde p(o)] + \mathbb E_{q(s \mid \pi)}[\mathbb H[p(o \mid s)]]$$
Whence \(\mathcal G[q; \pi]\)
We are not done yet though. We still need to calculate each \(\mathcal G[q; \pi]\). Remember that \(o = o_{1:T}\) and \(s = s_{1:T}\) are sequences of observations and corresponding mental states. To simplify the inference of \(\mathcal G[q; \pi]\) we will make a couple of assumptions [3, p.453]:
$$q(s_{1:T} | \pi) \approx \prod_{t=1}^{T} q(s_t | \pi)$$
And
$$p(o_{1:T}, s_{1:T} | \pi) \approx \prod_{t=1}^{T} p(o_t, s_t | \pi)$$
These simplifications rely on a mean field approximation which assumes that all temporal dependencies in the sequences of observations and mental states are captured by the parameter \(\pi\) so that the distribution can be expressed as a product of independent distributions. This assumption yields:
$$\mathcal G[q; \pi] = \mathbb{E}_{q(s, o \mid \pi)} \left[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi)\right] =$$
$$\mathbb{E}_{q(s, o \mid \pi)} \left[\sum_{t=1}^T \left(\log q(s_t \mid \pi) – \log \tilde p(o_t, s_t \mid \pi)\right) \right] =$$
$$\sum_{t=1}^T \mathbb{E}_{q(s, o \mid \pi)} \left[\log q(s_t \mid \pi) – \log \tilde p(o_t, s_t \mid \pi) \right] = $$
$$\sum_{t=1}^T \mathcal G[q; \pi, t]$$
Where
$$\mathcal G[q; \pi, t] = \mathbb{E}_{q(s_t, o_t \mid \pi)} \left[\log q(s_t \mid \pi) – \log \tilde p(o_t, s_t \mid \pi)\right]$$
We can thus find \(\mathcal G[q; \pi]\) by calculating \(\mathcal G[q; \pi, t]\) for each time step in the planning horizon and taking the sum of all values.
Out of the lake
After a few seconds in the zero Celsius water, my brain started to dry-run (no pun intentended) a few policies to determine which one would get me to my desired mental states with a high probability. The first desired mental state in that sequence, and pretty much the limit of my planning horizon at that time, was of course to get out of the water, back onto the ice. The policy I chose (among not too many alternatives) involved using my ice prods (always bring ice prods!) to get out of the water.
Links
[1] Thomas Parr, Giovanni Pezzulo, Karl J. Friston. Active Inference. MIT Press Direct.
[2] Ryan Smith, Karl J. Friston, Christopher J. Whyte. A step-by-step tutorial on active inference and its application to empirical data. Journal of Mathematical Psychology. Volume 107. 2022.
[3] Beren Millidge, Alexander Tschantz, Christopher L. Buckley. Whence the Expected Free Energy. Neural Computation 33, 447–482 (2021).
[4] Stephen Francis Mann, Ross Pain, Michael D. Kirchhoff. Free energy: a user’s guide. Biology & Philosophy (2022) 37: 33.
[5] Carol Tavris on mistakes, justification, and cognitive dissonance. Sean Carroll’s Mindscape: science, society, philosophy, culture, arts, and ideas. Podcast (Spotify link).
[6] The principle of least action. Feynman Lectures.
[7] Sean Carrols Mindscape podcast. Episode 39: Malcolm MacIver on Sensing, Consciousness, and Imagination.
[8] Caucheteux, C., Gramfort, A. & King, JR. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nat Hum Behav 7, 430–441 (2023). https://doi.org/10.1038/s41562-022-01516-2
- Intuitively (and speculatively) I believe I think in terms of states when I plan actions consciously, like when I (a long time ago) planned my education. I wanted to end up in a state of competence and knowledge which is a very abstract state that is not easy to characterize with an observation. When I jump my horse, I on the other hand see my self on the other side of the fence after the jump (as opposed to on the ground in front of the fence). A physical and to a degree unconscious want like that may be more observation-based. There is most likely of hierarchy of representations from e.g., raw retinal input to abstract states so maybe there is a continuum between what we call observations and what we call states. ↩︎
- One hypothesis is that imagination, and thus the ability to hold preferences, arose when fish climbed up on land and could see much further than under water. This made it possible, and necessary, to plan further ahead by imagining and evaluating different possible courses of actions [7]. ↩︎
- The term active inference is used to describe both the whole framework and the action-oriented part of the framework. I find this confusing. Active inference seems to refer to the fact that the policy (sequence of actions) can be inferred through an inference algorithm resembling the one used to infer the posterior in perceptual inference. To avoid overloading the term active inference I will call the action-oriented part of active inference action inference for now. ↩︎
- It is possible to build a continuous model of active inference. I start with introducing the discrete variant as it is somewhat more intuitive. ↩︎
- We can derive \(\mathcal G[q]\) from \(\mathcal F[q; o]\) by taking the expectation of \(\mathcal F[q; o]\) over \(q(o \mid s, \pi) = p(o \mid s)\). (We cannot take the expectation over \(q(o \mid \pi)\) as this would ignore the fact that the observations and the states are correlated.) \(\mathcal G[q] = \mathbb{E}_{p(o \mid s)q(s, \pi)} [\log q(s, \pi) – \log p(o, s, \pi)]\). ↩︎