As you read the words on this page, you might also notice a growing feeling of confidence that you understand their meaning. Every day we make decisions based on ambiguous information and in response to factors over which we have little or no control. Yet rather than being constantly paralysed by doubt, we generally feel reasonably confident about our choices. So where does this feeling of confidence come from?
Computational models of human decision-making assume that our confidence depends on the quality of the information available to us: the less ambiguous this information, the more confident we should feel. According to this idea, the information on which we base our decisions is also the information that determines how confident we are that those decisions are correct. However, recent experiments suggest that this is not the whole story. Instead, our internal states — specifically how our heart is beating and how alert we are — may influence our confidence in our decisions without affecting the decisions themselves.
To test this possibility, Micah Allen and co-workers asked volunteers to decide whether dots on a screen were moving to the left or to the right, and to indicate how confident they were in their choice. As the task became objectively more difficult, the volunteers became less confident about their decisions. However, increasing the volunteers’ alertness or “arousal” levels immediately before a trial countered this effect, showing that task difficulty is not the only factor that determines confidence. Measures of arousal — specifically heart rate and pupil dilation — were also related to how confident the volunteers felt on each trial. These results suggest that unconscious processes might exert a subtle influence on our conscious, reflective decisions, independently of the accuracy of the decisions themselves.
The next step will be to develop more refined mathematical models of perception and decision-making to quantify the exact impact of arousal and other bodily sensations on confidence. The results may also be relevant to understanding clinical disorders, such as anxiety and depression, where changes in arousal might lock sufferers into an unrealistically certain or uncertain world.
The PNAS journal club also published a useful summary, including some great quotes from Phil Corlett and Rebecca Todd:
… Allen’s findings are “relevant to anyone whose job is to make difficult perceptual judgments trying to see signal in a lot of noise,” such as radiologists or baggage inspectors, says cognitive neuroscientist Rebecca Todd at the University of British Columbia in Vancouver, who did not take part in the research. Todd suggests that people who apply decision-making models to real world problems need to better account for the influence of internal or emotional states on confidence.
The fact that bodily states can influence confidence may even shed light on mental disorders, which often involve blunted or heightened signals from the body. Symptoms could result from how changes in sensory input affect perceptual decision-making, says cognitive neuroscientist and schizophrenia researcher Phil Corlett at Yale University, who did not participate in this study.
Corlett notes that some of the same ion channels involved in regulating heart rate are implicated in schizophrenia as well. “Maybe boosting heart rate might lead people with schizophrenia to see or hear things that aren’t present,” he speculates, adding that future work could analyze how people with mental disorders perform on these tasks…
I also wrote a blog post summarizing the article for The Conversation:
How do we become aware of our own thoughts and feelings? And what enables us to know when we’ve made a good or bad decision? Every day we are confronted with ambiguous situations. If we want to learn from our mistakes, it is important that we sometimes reflect on our decisions. Did I make the right choice when I leveraged my house mortgage against the market? Was that stop light green or red? Did I really hear a footstep in the attic, or was it just the wind?
When events are more uncertain, for example if our windscreen fogs up while driving, we are typically less confident in what we’ve seen or decided. This ability to consciously examine our own experiences, sometimes called introspection, is thought to depend on the brain appraising how reliable or “noisy” the information driving those experiences is. Some scientists and philosophers believe that this capacity for introspection is a necessary feature of consciousness itself, forging the crucial link between sensation and awareness.
One important theory is that the brain acts as a kind of statistician, weighting options by their reliability, to produce a feeling of confidence more or less in line with what we’ve actually seen, felt or done. And although this theory does a reasonably good job of explaining our confidence in a variety of settings, it neglects an important fact about our brains – they are situated within our bodies. Even now, as you read the words on this page, you might have some passing awareness of how your socks sit on your feet, how fast your heart is beating or if the room is the right temperature.
Even if you were not fully aware of these things, the body is always shaping how we experience ourselves and the world around us. That is to say experience is always from somewhere, embodied within a particular perspective. Indeed, recent research suggests that our conscious awareness of the world is very much dependent on exactly these kinds of internal bodily states. But what about confidence? Is it possible that when I reflect on what I’ve just seen or felt, my body is acting behind the scenes? …
The New Scientist took an interesting angle not as explored in the other write-ups, and also included a good response from Ariel Zylberberg:
“We were tricking the brain and changing the body in a way that had nothing to do with the task,” Allen says. In doing so, they showed that a person’s sense of confidence relies on internal as well as external signals – and the balance can be shifted by increasing your alertness.
Allen thinks the reaction to disgust suppressed the “noise” created by the more varied movement of the dots during the more difficult versions of the task. “They’re taking their own confidence as a cue and ignoring the stimulus in the world.”
“It’s surprising that they show that confidence can be motivated by processes inside a person, instead of what we tend to believe, which is that confidence should be motivated by external things that affect a decision,” says Ariel Zylberberg at Columbia University in New York. “Disgust leads to aversion. If you try a food and it’s disgusting, you walk away from it,” says Zylberberg. “Here, if you induce disgust, high confidence becomes lower and low confidence becomes higher. It could be that disgust is generating this repulsion.”
It is not clear whether it is the feeling of disgust that changes a person’s confidence in this way, or whether inducing alertness with a different emotion, such as anger or fear, would have the same effect.
You can find all the coverage for our article using these excellent services, altmetric & ImpactStory.
Authors note: this marks the first in a new series of journal-entry style posts in which I write freely about things I like to think about. The style is meant to be informal and off the cuff, building towards a sort of socratic dialogue. Please feel free to argue or debate any point you like. These are meant to serve as exercises in writing and thinking, to improve the quality of both and lay groundwork for future papers.
My wife Francesca and I are spending the winter holidays vacationing in the north Italian countryside with her family. Today in our free time our discussions turned to how predictive coding and generative models can accomplish the multimodal perception that characterizes the brain. To this end Francesca asked a question we found particularly thought provoking: if the brain at all levels is only communicating forward what is not predicted (prediction error), how can you explain the functional specialization that characterizes the different senses? For example, if each sensory hierarchy is only communicating prediction errors, what explains their unique specialization in terms of e.g. the frequency, intensity, or quality of sensory inputs? Put another way, how can the different sensations be represented, if the entire brain is only communicating in one format?
We found this quite interesting, as it seems straightforward and yet the answer lies at the very basis of predictive coding schemes. To arrive at an answer we first had to lay a little groundwork in terms of information theory and basic neurobiology. What follows is a grossly oversimplified account of the basic neurobiology of perception, which serves only as a kind of philosopher’s toy example to consider the question. Please feel free to correct any gross misunderstandings.
To begin, it is clear at least according to Shannon’s theory of information, that any sensory property can be encoded in a simple system of ones and zeros (or nerve impulses). Frequency, time, intensity, and so on can all be re-described in terms of a simplistic encoding scheme. If this were not the case then modern television wouldn’t work. Second, each sensory hierarchy presumably begins with a sensory effector, which directly transduces physical fluctuations into a neuronal code. For example, in the auditory hierarchy the cochlea contains small hairs that vibrate only to a particular frequency of sound wave. This vibration, through a complex neuro-mechanic relay, results in a tonitopic depolarization of first order neurons in the spiral ganglion.
It is here at the first-order neuron where the hierarchy presumably begins, and also where functional specialization becomes possible. It seems to us that predictive coding should say that the first neuron is simply predicting a particular pattern of inputs, which correspond directly to an expected external physical property. To try and give a toy example, say we present the brain with a series of tones, which reliably increase in frequency at 1 Hz intervals. At the lowest level the neuron will fire at a constant rate if the frequency at interval n is 1 greater than the previous interval, and will fire more or less if the frequency is greater or less than this basic expectation, creating a positive or negative prediction error (remember that the neuron should only alter its firing pattern if something unexpected happens). Since frequency here is being signaled directly by the mechanical vibration of the cochlear hairs; the first order neuron is simply predicting which frequency will be signaled. More realistically, each sensory neuron is probably only predicting whether or not a particular frequency will be signaled – we know from neurobiology that low-level neurons are basically tuned to a particular sensory feature, whereas higher level neurons encode receptive fields across multiple neurons or features. All this is to say that the first-order neuron is specialized for frequency because all it can predict is frequency; the only afferent input is the direct result of sensory transduction. The point here is that specialization in each sensory system arises in virtue of the fact that the inputs correspond directly to a physical property.
Now, as one ascends higher in the hierarchy, each subsequent level is predicting the activity of the previous. The first-order neuron predicts whether a given frequency is presented, the second perhaps predicts if a receptive field is activated across several similarly tuned neurons, the third predicts a particular temporal pattern across multiple receptive fields, and so on. Each subsequent level is predicting a “hyperprior” encoding a higher order feature of the previous level. Eventually we get to a level where the prediction is no longer bound to a single sensory domain, but instead has to do with complex, non-linear interactions between multiple features. A parietal neuron thus might predict that an object in the world is a bird if it sings at a particular frequency and has a particular bodily shape.
If this general scheme is correct, then according to hierarchical predictive coding functional specialization primarily arises in virtue of the fact that at the lowest level each hierarchy is receiving inputs that strictly correspond to a particular feature. The cochlea is picking up fluctuations in air vibration (sound), the retina is picking up fluctuations in light frequency (light), and the skin is picking up changes in thermal amplitude and tactile frequency (touch). The specialization of each system is due to the fact that each is attempting to predict higher and higher order properties of those low-level inputs, which are by definition particular to a given sensory domain. Any further specialization in the hierarchy must then arise from the fact that higher levels of the brain predict inputs from multiple sensory systems – we might find multimodal object-related areas simply because the best hyper-prior governing nonlinear relationships between frequency and shape is an amodal or cross-model object. The actual etiology of higher-level modules is a bit more complicate than this, and requires an appeal to evolution to explain in detail, but we felt this was a generally sufficient explanation of specialization.
Nonlinearity of the world and perception: prediction as integration
At this point, we felt like we had some insight into how predictive coding can explain functional specialization without needing to appeal to special classes of cortical neurons for each sensation. Beyond the sensory effectors, the function of each system can be realized simply by means of a canonical, hierarchical prediction of each layered input, right down to the point of neurons which predict which frequency will be signaled. However, something still was missing, prompting Francesca to ask – how can this scheme explain the coherent, multi-modal, integrated perception, which characterizes conscious experience?
Indeed, we certainly do not experience perception as a series of nested predictions. All of the aforementioned machinery functions seamlessly beyond the point of awareness. In phenomenology a way to describe such influences is as being prenoetic (before knowing; see also prereflective); i.e. things that influence conscious experience without themselves appearing in experience. How then can predictive coding explain the transition from segregated, feature specific predictions to the unified percept we experience?
As you might guess, we already hinted at part of the answer. Imagine if instead of picturing each sensory hierarchy as an isolated pyramid, we instead arrange them such that each level is parallel to its equivalent in the ‘neighboring’ hierarchy. On this view, we can see that relatively early in each hierarchy you arrive at multi-sensory neurons that are predicting conjoint expectations over multiple sensory inputs. Conveniently, this observation matches what we actually know about the brain; audition, touch, and vision all converge in tempo-parietal association areas.
Perceptual integration is thus achieved as easily as specialization; it arises from the fact that each level predicts a hyperprior on the previous level. As one moves upwards through the hierarchy, this means that each level predicts more integrated, abstract, amodal entities. Association areas don’t predict just that a certain sight or sound will appear, but instead encode a joint expectation across both (or all) modalities. Just like the fusiform face area predicts complex, nonlinear conjunctions of lower-level visual features, multimodal areas predict nonlinear interactions between the senses.
It is this nonlinearity that makes predictive schemes so powerful and attractive. To understand why, consider the task the brain must solve to be useful. Sensory impressions are not generated by simple linear inputs; certainly for perception to be useful to an organism it must process the world at a level that is relevant for that organism. This is the world of objects, persons, and things, not disjointed, individual sensory properties. When I watch a cat walk behind a fence, I don’t perceive it as two halves of a cat and a fence post, but rather as a cat hidden behind a fence. These kinds of nonlinear interactions between objects and properties of the world are ubiquitous in perception; the brain must solve not for the immediately available sensory inputs but rather the complex hidden causes underlying them. This is achieved in a similar manner to a deep convolutional network; each level performs the same canonical prediction, yet together the hierarchy will extract the best-hidden features to explain the complex interactions that produce physical sensations. In this way the predictive brain summersaults the binding problem of perception; perception is integrated precisely because conjoint hypothesis are better, more useful explanations than discrete ones. As long as the network has sufficient hierarchical depth, it will always arrive at these complex representations. It’s worth noting we can observe the flip-side of this process in common visual illusions, where the higher-order percept or prior “fills in” our actual sensory experience (e.g. when we perceive a convex circle as being lit from above).
Beating the homunculus: the dynamic, enactive Bayesian brain
Feeling satisfied with this, Francesca and I concluded our fun holiday discussion by thinking about some common misunderstandings this scheme might lead one into. For example, the notion of hierarchical prediction explored above might lead one to expect that there has to be a “top” level, a kind of super-homunculus who sits in the prefrontal cortex, predicting the entire sensorium. This would be an impossible solution; how could any subsystem of the brain possibly predict the entire activity of the rest? And wouldn’t that level itself need to be predicted, to be realised in perception, leading to infinite regress? Luckily the intuition that these myriad hypotheses must “come together” fundamentally misunderstands the Bayesian brain.
Remember that each level is only predicting the activity of that before it. The integrative parietal neuron is not predicting the exact sensory input at the retina; rather it is only predicting what pattern of inputs it should receive if the sensory input is an apple, or a bat, or whatever. The entire scheme is linked up this way; the individual units are just stupid predictors of immediate input. It is only when you link them all up together in a deep network, that the brain can recapitulate the complex web of causal interactions that make up the world.
This point cannot be stressed enough: predictive coding is not a localizationist enterprise. Perception does not come about because a magical brain area inverts an entire world model. It comes about in virtue of the distributed, dynamic activity of the entire brain as it constantly attempts to minimize prediction error across all levels. Ultimately the “model” is not contained “anywhere” in the brain; the entire brain itself, and the full network of connection weights, is itself the model of the world. The power to predict complex nonlinear sensory causes arises because the best overall pattern of interactions will be that which most accurately (or usefully) explains sensory inputs and the complex web of interactions which causes them. You might rephrase the famous saying as “the brain is it’s own best model of the world”.
As a final consideration, it is worth noting some misconceptions may arise from the way we ourselves perform Bayesian statistics. As an experimenter, I formalize a discrete hypothesis (or set of hypotheses) about something and then invert that model to explain data in a single step. In the brain however the “inversion” is just the constant interplay of input and feedback across the nervous system at all levels. In fact, under this distributed view (at least according to the Free Energy Principle), neural computation is deeply embodied, as actions themselves complete the inferential flow to minimize error. Thus just like neural feedback, actions function as ‘predictions’, generated by the inferential mechanism to render the world more sensible to our predictions. This ultimately minimises prediction error just as internal model updates do, albeit in a different ‘direction of fit’ (world to model, instead of model to world). In this way the ‘model’ is distributed across the brain and body; actions themselves are as much a part of the computation as the brain itself and constitute a form of “active inference”. In fact, if one extends their view to evolution, the morphological shape of the organism is itself a kind of prior, predicting the kinds of sensations, environments, and actions the agent is likely to inhabit. This intriguing idea will be the subject of a future blog post.
We feel this is an extremely exciting view of the brain. The idea that an organism can achieve complex intelligence simply by embedding a simple repetitive motif within a dynamical body seems to us to be a fundamentally novel approach to the mind. In future posts and papers, we hope to further explore the notions introduced here, considering questions about “where” these embodied priors come from and what they mean for the brain, as well as the role of precision in integration.
Questions? Comments? Feel like i’m an idiot? Sound off in the comments!
Brown, H., Adams, R. A., Parees, I., Edwards, M., & Friston, K. (2013). Active inference, sensory attenuation and illusions. Cognitive Processing, 14(4), 411–427. http://doi.org/10.1007/s10339-013-0571-3
Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 364(1521), 1211–1221. http://doi.org/10.1098/rstb.2008.0300
Moran, R. J., Campo, P., Symmonds, M., Stephan, K. E., Dolan, R. J., & Friston, K. J. (2013). Free Energy, Precision and Learning: The Role of Cholinergic Neuromodulation. The Journal of Neuroscience, 33(19), 8227–8236. http://doi.org/10.1523/JNEUROSCI.4255-12.2013