Some thoughts on writing ‘Bayes Glaze’ theoretical papers.

[This was a twitter navel-gazing thread someone ‘unrolled’. I was really surprised that it read basically like a blog post, so I thought why not post it here directly! I’ve made a few edits for readability. So consider this an experiment in micro-blogging ….]

In the past few years, I’ve started and stopped a paper on metacognition, self-inference, and expected precision about a dozen times. I just feel conflicted about the nature of these papers and want to make a very circumspect argument without too much hype. As many of you frequently note, we have way too many ‘Bayes glaze’ review papers in glam mags making a bunch of claims for which there is no clear relationship to data or actual computational mechanisms.

It has gotten so bad, I sometimes see papers or talks where it feels like they took totally unrelated concepts and plastered “prediction” or “prediction error” in random places. This is unfortunate, and it’s largely driven by the fact that these shallow reviews generate a bonkers amount of citations. It is a land rush to publish the same story over and over again just changing the topic labels, planting a flag in an area and then publishing some quasi-related empirical stuff. I know people are excited about predictive processing, and I totally share that. And there is really excellent theoretical work being done, and I guess flag planting in some cases is not totally indefensible for early career researchers. But there is also a lot of cynical stuff, and I worry that this speaks so much more loudly than the good, careful stuff. The danger here is that we’re going to cause a blowback and be ultimately seen as ‘cargo cult computationalists’, which will drag all of our research down both good and otherwise.

In the past my theoretical papers in this area have been super dense and frankly a bit confusing in some aspects. I just wanted to try and really, really do due-diligence and not overstate my case. But I do have some very specific theoretical proposals that I think are unique. I’m not sure why i’m sharing all this, but I think because it is always useful to remind people that we feel imposter syndrome and conflict at all career levels. And I want to try and be more transparent in my own thinking – I feel that the earlier I get feedback the better. And these papers have been living in my head like demons, simultaneously too ashamed to be written and jealous at everyone else getting on with their sexy high impact review papers.

Specifically, I have some fairly straightforward ideas about how interoception and neural gain (precision) inter-relate, and also have a model i’ve been working on for years about how metacognition relates to expected precision. If you’ve seen any of my recent talks, you get the gist of these ideas.

Now, I’m *really* going to force myself to finally write these. I don’t really care where they are published, it doesn’t need to be a glamour review journal (as many have suggested I should aim for). Although at my career stage, I guess that is the thing to do. I think I will probably preprint them on my blog, or at least muse openly about them here, although i’m not sure if this is a great idea for theoretical work.

Further, I will try and hold to three key promises:

  1. Keep it simple. One key hypothesis/proposal per paper. Nothing grandiose.
  2. Specific, falsifiable predictions about behavioral & neurophysiological phenomenon, with no (minimal?) hand-waving
  3. Consider alternative models/views – it really gets my goat when someone slaps ‘prediction error’ on their otherwise straightforward story and then acts like it’s the only game in town. ‘Predictive processing’ tells you almost *nothing* about specific computational architectures, neurobiological mechanisms, or general process theories. I’ve said this until i’m blue in the face: there can be many, many competing models of any phenomenon, all of which utilize prediction errors.

These papers *won’t* be explicitly computational – although we have that work under preparation as well – but will just try to make a single key point that I want to build on. If I achieve my other three aims, it should be reasonably straight-forward to build computational models from these papers.

That is the idea. Now I need to go lock myself in a cabin-in-the-woods for a few weeks and finally get these papers off my plate. Otherwise these Bayesian demons are just gonna keep screaming.

So, where to submit? Don’t say Frontiers…

For whom the bell tolls? A potential death-knell for the heartbeat counting task.

Interoception – the perception of signals arising from the visceral body – is a hot topic in cognitive neuroscience and psychology. And rightly so; a growing body of evidence suggests that brain-body interaction is closely linked to mood1, memory2, and mental health3. In terms of basic science, many theorists argue that the integration of bodily and exteroceptive (i.e., visual) signals underlies the genesis of a subjective, embodied point of view4–6.  However, noninvasively measuring (and even better, manipulating) interoception is inherently difficult. Unlike visual or tactile awareness, where an experimenter can carefully control stimulus strength and detection difficulty, interoceptive signals are inherently spontaneous, uncontrolled processes. As such, prevailing methods for measuring interoception typically involve subjects attending to their heartbeats and reporting how many heartbeats they counted in a given interval. This is known as the heartbeat counting task (or Schandry task, named after its creator)7. Now a new study has cast extreme doubt on what this task actually measures.

The study, published by Zamariola et al in Biological Psychology8, begins by detailing what we already largely know: the heartbeat counting task is inherently problematic. For example, the task is easily confounded by prior knowledge or beliefs about one’s average heart rate. Zamariola et al write:

“Since the original task instruction requires participants to estimate the number of heartbeats, individuals may provide an answer based on beliefs without actually attempting to perceive their heartbeats. Consistent with this view, one study (Windmann, Schonecke, Fröhlig, & Maldener, 1999) showed that changing the heart rate in patients with cardiac pacemaker, setting them to low (50 beats per minute, bpm), medium (75 bpm), or high (110 bpm) heart rate, did not influence their reported number of heartbeats. This suggests that these patients performed the task by relying on previous knowledge instead of perception of their bodily states.”

This raises the question of what exactly the task is measuring. The essence of heartbeat counting tasks is that one must silently count the number of perceived heartbeats over multiple temporal intervals. From this, an “interoceptive accuracy score” (IAcc) is computed using the formula:

1/3 ∑ (1–(|actual heartbeats – reported heartbeats|)/actual heartbeats)

This formula is meant to render over-counting (counting heartbeats that don’t occur) and under-counting (missing actual heartbeats) equivalent, in a score bounded by 0-1. Zamariola et al argue that these scores lack fundamental construct validity on the basis of four core arguments. I summarize each argument below; see the full article for the detailed explanation:

  1. [interoceptive] abilities involved in not missing true heartbeats may differ from abilities involved in not over-interpreting heartbeats-unrelated signals. [this assumption] would be questioned by evidence showing that IAcc scores largely depend on one error type only.
  2. IAcc scores should validly distinguish between respondents. If IAcc scores reflect people’s ability to accurately perceive their inner states, a correlation between actual and reported heartbeats should be observed, and this correlation should linearly increase with higher IAcc scores (i.e., better IAcc scorers should better map actual and reported heartbeats).
  3. a valid measure of interoception accuracy should not be structurally tied to heart condition. This is because heart condition (i.e. actual heartbeats) is not inherent to the definition of the interoceptive accuracy construct. In other words, it is essential for construct validity that people’s accuracy at perceiving their inner life is not structurally bound to their cardiac condition.”
  4. The counting interval [i.e., 10, 15, 30 seconds] should not impact IAcc; a wide range of scores are in fact used and these should be independent of the resultant measure.

Zamariola et al then go on to show that in a sample of 572 healthy individuals (386 female), each of these assumptions are strongly violated; IAcc scores depend largely on under-reporting heartbeats (Fig. 1), that the correlation of actual and perceived heartbeats is extremely low and higher at average than higher IAcc levels (Fig. 2), that IAcc is systematically increased as slower heart rates (Fig. 3), and that longer time intervals lead to substantially worse IAccc (not shown):

Fig. 1

  1. iACC scores are mainly driven by under-reporting; “less than 5% of participants showed overestimation… Hence, IAcc scores essentially inform us of how (un)willing participants are to report they perceived a heartbeat”

Fig. 2

2. Low overall correlation (grey-dashed line) of heartbeats counted and actual heartbeats (r = 0.16, 2.56% shared variance). Further, the correlation varied non-linearly across bins of iACC scores, which in the author’s words demonstrates that “IAcc scores fail to validly differentiate individuals in their ability to accurately perceive their inner states within the top 60% IAcc scorers.”

Fig. 3

3. iACC scores depend negatively on the number of actual heartbeats, suggesting that individuals with lower overall heart-rate will be erroneously characterized as ‘good interoceptive accuracy’.

Overall, the authors draw the conclusion the heartbeat counting task is nigh-useless, lacking both face and construct validity. What should we measure instead? The authors offer that, if one can have very many trials, than the mere correlation of counted and actual heartbeats may be a (slightly) better measure. However, given the massive bias present in under-reporting heartbeats, they suggest that the task measures only the willingness to report a heartbeat at all. As such, they highlight the need for true psychophysical tasks which can distinguish participant reporting bias (i.e., criterion) from the true sensitivity to heart beats. A potentially robust alternative may be the multi-interval heartbeat discrimination task9, in which a method of constant stimuli is used to compare heartbeats to multiple intervals of temporal stimuli. However, this task is substantially more difficult to administer; it requires some knowledge of psychophysics and as much as 45 minutes to complete. As many (myself included) are interesting in measuring interoception in sensitive patient populations, it’s not a given that this task will be widely adopted.

I’m curious what my readers think. For me, this paper proffers a final nail in the coffin of heartbeat counting tasks. Nearly every interoception researcher I’ve spoken to has expressed concerns about what the task actually measures. Worse, large intrasubject variance and the fact that many subjects perform incredibly poorly on the task seems to undermine the idea that it is anything like a measure of cardiac perception. At best, it seems to be a measure of interoceptive attention and report-bias. The study by Zamariola and colleagues is well-powered, sensibly conducted, and seems to provide unambiguous evidence against the task’s basic validity. Heart-beart counting; the bell tolls for thee.

References

  1. Foster, J. A. & McVey Neufeld, K.-A. Gut–brain axis: how the microbiome influences anxiety and depression. Trends Neurosci. 36, 305–312 (2013).
  2. Zelano, C. et al. Nasal Respiration Entrains Human Limbic Oscillations and Modulates Cognitive Function. J. Neurosci. 36, 12448–12467 (2016).
  3. Khalsa, S. S. et al. Interoception and Mental Health: a Roadmap. Biol. Psychiatry Cogn. Neurosci. Neuroimaging (2017). doi:10.1016/j.bpsc.2017.12.004
  4. Park, H.-D. & Tallon-Baudry, C. The neural subjective frame: from bodily signals to perceptual consciousness. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130208 (2014).
  5. Seth, A. K. Interoceptive inference, emotion, and the embodied self. Trends Cogn. Sci. 17, 565–573 (2013).
  6. Barrett, L. F. & Simmons, W. K. Interoceptive predictions in the brain. Nat. Rev. Neurosci. 16, 419–429 (2015).
  7. Schandry, R., Sparrer, B. & Weitkunat, R. From the heart to the brain: A study of heartbeat contingent scalp potentials. Int. J. Neurosci. 30, 261–275 (1986).
  8. Zamariola, G., Maurage, P., Luminet, O. & Corneille, O. Interoceptive Accuracy Scores from the Heartbeat Counting Task are Problematic: Evidence from Simple Bivariate Correlations. Biol. Psychol. doi:10.1016/j.biopsycho.2018.06.006
  9. Brener, J. & Ring, C. Towards a psychophysics of interoceptive processes: the measurement of heartbeat detection. Philos. Trans. R. Soc. B Biol. Sci. 371, 20160015 (2016).

 

Active-controlled, brief body-scan meditation improves somatic signal discrimination.

Here in the science blog-o-sphere we often like to run to the presses whenever a laughably bad study comes along, pointing out all the incredible feats of ignorance and sloth. However, this can lead to science-sucks cynicism syndrome (a common ailment amongst graduate students), where one begins to feel a bit like all the literature is rubbish and it just isn’t worth your time to try and do something truly proper and interesting. If you are lucky, it is at this moment that a truly excellent paper will come along at the just right time to pick up your spirits and re-invigorate your work. Today I found myself at one such low-point, struggling to figure out why my data suck, when just such a beauty of a paper appeared in my RSS reader.

data_sensing (1)The paper, “Brief body-scan meditation practice improves somatosensory perceptual decision making”, appeared in this month’s issue of Consciousness and Cognition. Laura Mirams et al set out to answer a very simple question regarding the impact of meditation training (MT) on a “somatic signal detection task” (SSDT). The study is well designed; after randomization, both groups received audio CDs with 15 minutes of daily body-scan meditation or excerpts from The Lord of The Rings. For the SSD task, participants simply report when they felt a vibration stimulus on the finger, where the baseline vibration intensity is first individually calibrated to a 50% detection rate. The authors then apply a signal-detection analysis framework to discern the sensitivity or d’ and decision criteria c.

Mirams et al found that, even when controlling for a host of baseline factors including trait mindfulness and baseline somatic attention, MT led to a greater increase in d’ driven by significantly reduced false-alarms. Although many theorists and practitioners of MT suggest a key role for interoceptive & somatic attention in related alterations of health, brain, and behavior, there exists almost no data addressing this prediction, making these findings extremely interesting. The idea that MT should impact interoception and somatosensation is very sensible- in most (novice) meditation practices it is common to focus attention to bodily sensations of, for example, the breath entering the nostril. Further, MT involves a particular kind of open, non-judgemental awareness of bodily sensations, and in general is often described to novice students as strengthening the relationship between the mind and sensations of the body. However, most existing studies on MT investigate traditional exteroceptive, top-down elements of attention such as conflict resolution and the ability to maintain attention fixation for long periods of time.

While MT certainly does involve these features, it is arguable that the interoceptive elements are more specific to the precise mechanisms of interest (they are what you actually train), whereas the attentional benefits may be more of a kind of side effect, reflecting an early emphasis in MT on establishing attention. Thus in a traditional meditation class, you might first learn some techniques to fixate your attention, and then later learn to deploy your attention to specific bodily targets (i.e. the breath) in a particular way (non-judgmentally). The goal is not necessarily to develop a super-human ability to filter distractions, but rather to change the way in which interoceptive responses to the world (i.e. emotional reactions) are perceived and responded to. This hypothesis is well reflected in the elegant study by Mirams et al; they postulate specifically that MT will lead to greater sensitivity (d’), driven by reduced false alarms rather than an increased hit-rate, reflecting a greater ability to discriminate the nature of an interoceptive signal from noise (note: see comments for clarification on this point by Steve Fleming – there is some ambiguity in interpreting the informational role of HR and FA in d’). This hypothesis not only reflects the theoretically specific contribution of MT (beyond attention training, which might be better trained by video games for example), but also postulates a mechanistically specific hypothesis to test this idea, namely that MT leads to a shift specifically in the quality of interoceptive signal processing, rather than raw attentional control.

At this point, you might ask if everyone is so sure that MT involves training interoception, why is there so little data on the topic? The authors do a great job reviewing findings (even including currently in-press papers) on interoception and MT. Currently there is one major null finding using the canonical heartbeat detection task, where advanced practitioners self-reported improved heart beat detection but in reality performed at chance. Those authors speculated that the heartbeat task might not accurately reflect the modality of interoception engaged in by practitioners. In addition a recent study investigated somatic discrimination thresholds in a cross-section of advanced practitioners and found that the ability to make meta-cognitive assessments of ones’ threshold sensitivity correlated with years of practice. A third recent study showed greater tactile sensation acuity in practitioners of Tai Chi.  One longitudinal study [PDF], a wait-list controlled fMRI investigation by Farb et al, found that a mindfulness-based stress reduction course altered BOLD responses during an attention-to-breath paradigm. Collectively these studies do suggest a role of MT in training interoception. However, as I have complained of endlessly, cross-sections cannot tell us anything about the underlying causality of the observed effects, and longitudinal studies must be active-controlled (not waitlisted) to discern mechanisms of action. Thus active-controlled longitudinal designs are desperately needed, both to determine the causality of a treatment on some observed effect, and to rule out confounds associated with motivation, demand-characteristic, and expectation. Without such a design, it is very difficult to conclude anything about the mechanisms of interest in an MT intervention.

In this regard, Mirams went above and beyond the call of duty as defined by the average paper. The choice of delivering the intervention via CD is excellent, as we can rule out instructor enthusiasm/ability confounds. Further the intervention chosen is extremely simple and well described; it is just a basic body-scan meditation without additional fluff or fanfare, lending to mechanistic specificity. Both groups were even instructed to close their eyes and sit when listening, balancing these often overlooked structural factors. In this sense, Mirams et al have controlled for instruction, motivation, intervention context, baseline trait mindfulness, and even isolated the variable of interest- only the MT group worked with interoception, though both exerted a prolonged period of sustained attention. Armed with these controls we can actually say that MT led to an alteration in interoceptive d’, through a mechanism dependent upon on the specific kind of interoceptive awareness trained in the intervention.

It is here that I have one minor nit-pick of the paper. Although the use of Lord of the Rings audiotapes is with precedent, and likely a great control for attention and motivation, you could be slightly worried that reading about Elves and Orcs is not an ideal control for listening to hours of tapes instructing you to focus on your bodily sensations, if the measure of interest involves fixating on the body. A pure active control might have been a book describing anatomy or body parts; then we could exhaustively conclude that not only is it interoception driving the findings, but the particular form of interoceptive attention deployed by meditation training. As it is, a conservative person might speculate that the observed differences reflect demand characteristics- MT participants deploy more attention to the body due to a kind of priming mechanism in the teaching. However this is an extreme nitpick and does not detract from the fact that Mirams and co-authors have made an extremely useful contribution to the literature. In the future it would be interesting to repeat the paradigm with a more body-oriented control, and perhaps also in advanced practitioners before and after an intensive retreat to see if the effect holds at later stages of training. Of course, given my interest in applying signal-detection theory to interoceptive meta-cognition, I also cannot help but wonder what the authors might have found if they’d applied a Fleming-style meta-d’ analysis to this study.

All in all, a clear study with tight methods, addressing a desperately under-developed research question, in an elegant fashion. The perfect motivation to return to my own mangled data ☺