update: for an excellent response to this post, see the comment by Anil Seth at the bottom of this article. Also don’t miss the extended debate regarding the general validity of causal methods for fMRI at Russ Poldrack’s blog that followed this post.
While the BOLD signal can be a useful measurement of brain function when used properly, the fact that it indexes blood flow rather than neural activity raises more than a few significant concerns. That is to say, when we make inferences on BOLD, we want to be sure the observed effects are causally downstream of actual neural activity, rather than the product of physiological noise such as fluctuations in breath or heart rate. This is a problem for all fMRI analyses, but is particularly tricky for resting state fMRI, where we are interested in signal fluctuations that fall in the same range as respiration and pulse. Now a new study has extended these troubles to granger causality modelling (GCM), a lag-based method for estimating causal interactions between time series, popular in the resting state literature. Just how bad is the damage?
In an article published this week in PLOS ONE, Webb and colleagues analysed over a thousand scans from the Human Connectome database, examining the reliability of GCM estimates and the proximity of the major ‘hubs’ identified by GCM with known major arteries and veins. The authors first found that GCM estimates were highly robust across participants:

They further report that “the largest [most robust] lags are for BOLD Granger causality differences for regions close to large veins and dural venous sinuses”. In other words, although the major ‘upstream’ and ‘downstream’ nodes estimated by GCM are highly robust across participants, regions primarily effecting other regions (e.g. causal outflow) map onto major arteries, whereas regions primarily receiving ‘inputs’ (e.g. causal inflow) map onto veins. This pattern of ‘causation’ is very difficult to explain as anything other than a non-neural artifact, as it seems like the regions mostly ‘causing’ activity in others are exactly where you would have fresh blood coming into the brain, and regions primarily being influenced by others seem to be areas of major blood drainage. Check out the arteriogram and venogram provided by the authors:

Compare the above to their thresholded z-statistic map for significant granger causality; white are areas of significant g-causation overlapping with an ateriogram mask, green are significant areas overlapping with a venogram mask:

“Figure 5. Mean Z-statistic for significant Granger causality differences to seed ROIs. Z-statistics were averaged for a given target ROI with the 264 seed ROIs to which it exhibited significantly asymmetric Granger causality relationship. Masks are overlaid for MRI arteriograms (white) and MRI venograms (green) for voxels with greater than 2 standard deviations signal intensity of in-brain voxels in averaged images from 33 (arteriogram) and 34 (venogram) subjects. Major arterial inflow and venous outflow distributions are labeled.”
It’s fairly obvious from the above that a significant proportion of the areas typically G-causing other areas overlap with arteries, whereas areas typically being g-caused by others overlap with veins. This is a serious problem for GCM of resting state fMRI, and worse, these effects were also observed for a comprehensive range of task-based fMRI data. The authors come to the grim conclusion that “Such arterial inflow and venous drainage has a highly reproducible pattern across individuals where major arterial and venous distributions are largely invariant across subjects, giving the illusion of reliable timing differences between brain regions that may be completely unrelated to actual differences in effective connectivity”. Importantly, this isn’t the first time GCM has been called into question. A related concern is the impact of spatial variation in the lag between neural activation and the BOLD response (the ‘hemodynamic response function’, HRF) across the brain. Previous work using simultaneous intracranial and BOLD recordings has shown that due to these lags, GCM can estimate a causal pattern of A then B, whereas the actual neural activity was B then A.
This is because GCM acts in a relatively simple way; given two time-series (A & B), if a better estimate of the future state of B can be predicted by the past fluctation of both A and B than that provided by B alone, then A is said to G-cause B. However, as we’ve already established, BOLD is a messy and complex signal, where neural activity is filtered through slow blood fluctuations that must be carefully mapped back onto to neural activity using deconvolution methods. Thus, what looks like A then B in BOLD, can actually be due to differences in HRF lags between regions – GCM is blind to this as it does not consider the underlying process producing the time-series. Worse, while this problem can be resolved by combining GCM (which is naïve to the underlying cause of the analysed time series) with an approach that de-convolves each voxel-wise time-series with a canonical HRF, the authors point out that such an approach would not resolve the concern raised here that granger causality largely picks up macroscopic temporal patterns in blood in- and out-flow:
“But even if an HRF were perfectly estimated at each voxel in the brain, the mechanism implied in our data is that similarly oxygenated blood arrives at variable time points in the brain independently of any neural activation and will affect lag-based directed functional connectivity measurements. Moreover, blood from one region may then propagate to other regions along the venous drainage pathways also independent of neural to vascular transduction. It is possible that the consistent asymmetries in Granger causality measured in our data may be related to differences in HRF latency in different brain regions, but we consider this less likely given the simpler explanation of blood moving from arteries to veins given the spatial distribution of our results.”
As for correcting for these effects, the authors suggest that a nuisance variable approach estimating vascular effects related to pulse, respiration, and breath-holding may be effective. However, they caution that the effects observed here (large scale blood inflow and drainage) take place over a timescale an order of magnitude slower than actual neural differences, and that this approach would need extremely precise estimates of the associated nuisance waveforms to prevent confounded connectivity estimates. For now, I’d advise readers to be critical of what can actually be inferred from GCM until further research can be done, preferably using multi-modal methods capable of directly inferring the impact of vascular confounds on GCM estimates. Indeed, although I suppose am a bit biased, I have to ask if it wouldn’t be simpler to just use Dynamic Causal Modelling, a technique explicitly designed for estimating causal effects between BOLD timeseries, rather than a method originally designed to estimate influences between financial stocks.
References for further reading:
Friston, K. (2009). Causal modelling and brain connectivity in functional magnetic resonance imaging. PLoS biology, 7(2), e33. doi:10.1371/journal.pbio.1000033
Friston, K. (2011). Dynamic causal modeling and Granger causality Comments on: the identification of interacting networks in the brain using fMRI: model selection, causality and deconvolution. NeuroImage, 58(2), 303–5; author reply 310–1. doi:10.1016/j.neuroimage.2009.09.031
Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with Granger causality and dynamic causal modelling. Current opinion in neurobiology, 23(2), 172–8. doi:10.1016/j.conb.2012.11.010
Webb, J. T., Ferguson, M. a., Nielsen, J. a., & Anderson, J. S. (2013). BOLD Granger Causality Reflects Vascular Anatomy. (P. A. Valdes-Sosa, Ed.)PLoS ONE, 8(12), e84279. doi:10.1371/journal.pone.0084279
Chang, C., Cunningham, J. P., & Glover, G. H. (2009). Influence of heart rate on the BOLD signal: the cardiac response function. NeuroImage, 44(3), 857–69. doi:10.1016/j.neuroimage.2008.09.029
Chang, C., & Glover, G. H. (2009). Relationship between respiration, end-tidal CO2, and BOLD signals in resting-state fMRI. NeuroImage, 47(4), 1381–93. doi:10.1016/j.neuroimage.2009.04.048
Lund, T. E., Madsen, K. H., Sidaros, K., Luo, W.-L., & Nichols, T. E. (2006). Non-white noise in fMRI: does modelling have an impact? Neuroimage, 29(1), 54–66.
David, O., Guillemain, I., Saillet, S., Reyt, S., Deransart, C., Segebarth, C., & Depaulis, A. (2008). Identifying neural drivers with functional MRI: an electrophysiological validation. PLoS biology, 6(12), 2683–97. doi:10.1371/journal.pbio.0060315
Update: This post continued into an extended debate on Russ Poldrack’s blog, where Anil Seth made the following (important) comment
Hi this is Anil Seth. What an excellent debate and I hope I can add few quick thoughts of my own since this is an issue close to my heart (no pub intended re vascular confounds).
First, back to the Webb et al paper. They indeed show that a vascular confound may affect GC-FMRI but only in the resting state and given suboptimal TR and averaging over diverse datasets. Indeed I suspect that their autoregressive models may be poorly fit so that the results rather reflect a sort-of mental chronometry a la Menon, rather than GC per se.In any case the more successful applications of GC-fMRI are those that compare experimental conditions or correlate GC with some behavioural variable (see e.g. Wen et al.http://www.ncbi.nlm.nih.gov/pubmed/22279213). In these cases hemodynamic and vascular confounds may subtract out.Interpreting findings like these means remembering that GC is a description of the data (i.e. DIRECTED FUNCTIONAL connectivity) and is not a direct claim about the underlying causal mechanism (e.g. like DCM, which is a measure of EFFECTIVE connectivity). Therefore (model light) GC and (model heavy) DCM are to a large extent asking and answering different questions, and to set them in direct opposition is to misunderstand this basic point. Karl, Ros Moran, and I make these points in a recent review (http://www.ncbi.nlm.nih.gov/pubmed/23265964).Of course both methods are complex and ‘garbage in garbage out’ applies: naive application of either is likely to be misleading or worse. Indeed the indirect nature of fMRI BOLD means that causal inference will be very hard. But this doesn’t mean we shouldn’t try. We need to move to network descriptions in order to get beyond the neo-phrenology of functional localization. And so I am pleased to see recent developments in both DCM and GC for fMRI. For the latter, with Barnett and Chorley I have shown that GC-FMRI is INVARIANT to hemodynamic convolution given fast sampling and low noise (http://www.ncbi.nlm.nih.gov/pubmed/23036449). This counterintuitive finding defuses a major objection to GC-fMRI and has been established both in theory, and in a range of simulations of increasing biophysical detail. With the development of low-TR multiband sequences, this means there is renewed hope for GC-fMRI in practice, especially when executed in an appropriate experimental design. Barnett and I have also just released a major new GC software which avoids separate estimation of full and reduced AR models, avoiding a serious source of bias afflicting previous approaches (http://www.ncbi.nlm.nih.gov/pubmed/24200508).Overall I am hopeful that we can move beyond premature rejection of promising methods on the grounds they fail when applied without appropriate data or sufficient care. This applies to both GC and fMRI. These are hard problems but we will get there.
This is a real shame, I always liked the idea of using Granger causality in fMRI data. It appealed to me that it can generate maps of ‘polarity’ (i.e. areas being influenced, and areas influencing) based on a seed-region. Seemed like a nice, exploratory method of looking at connectivity without the constraint of having to impose a formal model like in DCM. I know Tom Nichols (and others) were always of the opinion that the (realtively poor) temporal resolution of fMRI violated some of the assumptions inherent in Granger-style logic, and maybe this new finding is the final nail in the coffin!
Yes in the beginning of my PhD I was quite positive about GCM, as it seemed like a great data driven way to look at effective connectivity over large networks with relative ease. I was frustrated when peers told me not to trust it, and started delving into the big debate between Friston and Roebruck et al. I came away from that feeling like there were some serious issues with GCM for fMRI. Proponents of it are always downplaying these issues or claiming that a new toolbox solving them is just around the corner, but it seems a bit like starting out with a leaky bathtub and trying to claim it’s a sailboat once you patch all those wholes. I found that this paper really pushed me over the edge. Sadly GCM is still imminently more publishable than the more complex DCM – try getting a DCM study in PNAS!
I should say that on Facebook, Anil Seth (a creator of the original GCM toolbox) stressed that he wasn’t overly moved by this paper. He felt it was a weakness that they don’t directly compare different task conditions, which in theory would be less susceptible to the issues raised here. That still leaves open the entire case of GCM for resting state, which is the more popular application by far. I acknowledge that the findings here are not conclusive, but the authors do replicate their findings on 7 different tasks, so I’m also not moved by that critique. I want an easy to use exploratory method as much as anyone, but I just don’t feel we can trust GCM for fMRI. Further, with the new post-hoc model optimization, it’s incredibly easy to develop DCMs for a large number of nodes. PPI plus DCM can be further used to restrain model space.
You could also try the some of the new transfer entropy methods as exploratory methods, ie multivariate and conditioned/partial. The authors of trentool (see below) recommend running transfer entropy first to explore the data set and then a focused DCM bsed on those results. There are some elegant implementations, although as always you run up against data length issues with fMRI data:
http://www.trentool.de/
http://lizier.me/joseph/software/#infodynamics
You should write a post that would explain (in layman’s words) how DCM manages to avoid vascular confounds when assessing causality. Especially the new DCM for resting state flavour.
That is a good idea. I’ll attend a few FIL methods seminars in January to make sure I’ve not mucked up the details and then write something up.
nice study indeed. should try with Arterial Spin Label to have almost real time effect.
Here two approaches tackling differently the issue for GCM:
Granger causality analysis of fMRI BOLD signals is invariant to hemodynamic convolution but not downsampling.
Seth AK, Chorley P, Barnett LC.
http://www.ncbi.nlm.nih.gov/pubmed/24200508
A blind deconvolution approach to recover effective connectivity brain networks from resting state fMRI data.
Wu GR, Liao W, Stramaglia S, Ding JR, Chen H, Marinazzo D.
http://www.ncbi.nlm.nih.gov/pubmed/23422254
I am perfectly ok with DCM (still I like it better for EEG), but Olivier David’s study uses really abnormally long HRFs, and there is a quite remarkableinter-subject (inter-rat) variability
sorry, don’t mean to over-cite myself, but I find it interesting to compare the above maps with those in this other paper in which we study GC at voxel-wise level after HRF deconvolution
Mapping the Voxel-Wise Effective Connectome in Resting State fMRI
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0073670
Interesting that the vains/vessels are exactly the DMN pattern. Is there really a default mode network, or is it simple the default blood flow of the brain?
Just deconvolve the HRF and use it in a GCM. Whats the fuss about ? Even before this paper, everyone knew that the HRF has to be deconvolved before being used in any Granger analysis.
I quoted at length from the paper, but deconvolving the HRF actually doesn’t address this problem at all. Here is the relevant explanation:
“But even if an HRF were perfectly estimated at each voxel in the brain, the mechanism implied in our data is that similarly oxygenated blood arrives at variable time points in the brain independently of any neural activation and will affect lag-based directed functional connectivity measurements. Moreover, blood from one region may then propagate to other regions along the venous drainage pathways also independent of neural to vascular transduction. It is possible that the consistent asymmetries in Granger causality measured in our data may be related to differences in HRF latency in different brain regions, but we consider this less likely given the simpler explanation of blood moving from arteries to veins given the spatial distribution of our results.”
The paper is undoubtedly interesting and points out several potentially important issues, but I have another small set of thoughts:
-in the study the authors used bivariate Granger Causality, which in particular for a dataset composed of many time series of short length, will likely result in a huge number of false positives due to mediated influences.
-in the paper they only look at regions that exhibit “significant GC asymmetry”. Why is that? GC is a directed measure, and you can have GC from A to B and GC from B to A, which is different from having zero GC in both directions. This introduces a bias in the study.
-voxel-wise Granger Causality with the partial conditioning technique, and with HRF deconvolution for each voxel, gives really different GC maps
-@Anil: your demonstration on invariance to HRF is elegant and correct, but I guess that it’s quite clear that the problem with HRF is a problem when coupled to downsampling, so with the current state of things, it is necessary to take the HRF shape into account.
-then, let’s admit that there really exists a bias on GC due to blood flow (I suspect it does but not in the extent shown in the paper):
if you look at the world with pink glasses, the world looks pink. The glasses here are the BOLD signal, it’s like when people are surprised of seeing “slow oscillations” in BOLD, when well, it’s what it’s about. If I am allowed the analogy, I would say that BOLD is to neural signals in fMRI what gravity is to other physical forces. It does not invalidate them, it just sets a different baseline.
here you find the maps for HRF height, time to peak and full width at half maximum from the NYU/ROCKLAND dataset available at 1000 Functional Connectomes (TR=645 ms)
http://figshare.com/preview/_url/886139/project/391
looks like the “sink” GC areas (or the venous areas if you like) are those with slower, higher and wider HRF, so a quite textbook example of why is necessary to perform deconvolution, in my opinion.
I’ve seen papers that compare using MEG to using fMRI. What mechanism would confound in exactly the same way, MEG measurements as fMRI’s?