How Neuroscientists Observe Brains Watching Movies

Functional MRI can peer inside your brain and watch you watching a YouTube clip

UNLESS YOU HAVE been deaf and blind to the world over the past decade, you know that functional magnetic resonance brain imaging (fMRI) can look inside the skull of volunteers lying still inside the claustrophobic, coffinlike confines of a loud, banging magnetic scanner. The technique relies on a fortuitous property of the blood supply to reveal regional activity. Active synapses and neurons consume power and therefore need more oxygen, which is delivered by the hemoglobin molecules inside the circulating red blood cells. When these molecules give off their oxygen to the surrounding tissue, they not only change color—from arterial red to venous blue—but also turn slightly magnetic.

Activity in neural tissue causes an increase in the volume and flow of fresh blood. This change in the blood supply, called the hemodynamic signal, is tracked by sending radio waves into the skull and carefully listening to their return echoes. FMRI does not directly measure synaptic and neuronal activity, which occurs over the course of milliseconds; instead it uses a relatively sluggish proxy—changes in the blood supply—that rises and falls in seconds. The spatial resolution of fMRI is currently limited to a volume element (voxel) the size of a pea, encompassing about one million nerve cells.

Neuroscientists routinely exploit fMRI to infer what volunteers are seeing, imagining or intending to do. It is really a primitive form of mind reading. Now a team has taken that reading to a new, startling level.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

A number of groups have deduced the identity of pictures viewed by volunteers while lying in the magnet scanner from the slew of maplike representations found in primary, secondary and higher-order visual cortical regions underneath the bump on the back of the head.

Jack L. Gallant of the University of California, Berkeley, is the acknowledged master of these techniques, which proceed in two stages. First, a volunteer looks at a couple of thousand images while lying in a magnet. The response of a few hundred voxels in the visual cortex to each image is carefully registered. These data are then used to train an algorithm to predict the magnitude of the fMRI response for each voxel. Second, this procedure is inverted. That is, for a given magnitude of hemodynamic response, a probabilistic technique called Bayesian decoding infers the most likely image that gave rise to the observed response in that particular volunteer (human brains differ substantially, so it is difficult to use one brain to predict the responses of another).

The best of these techniques exploit preexisting, or prior, knowledge about pictures that could have been seen before. The number of mathematically possible images is vast, but the types of actual scenes that are encountered in a world populated by people, animals, trees, buildings and other objects encompass a tiny fraction of all possible images. Appropriately enough, the images that we usually encounter are called natural images. Using a database of six million natural images, Gallant’s group showed in 2009 how brain responses of volunteers to photographs they had not previously encountered could be reconstructed.

From Images to Movies
These reconstructions are surprisingly good, even though they are based on the smudged activity of hundreds of thousands of highly diverse nerve cells, each one firing to different aspects of the image—its local intensity, color, shading, texture, and so on. A further limitation I have already alluded to is the 1,000-fold mismatch between the celerity of neuronal signals and the sedate pace at which the fMRI signal rises and falls.

Yet Gallant’s group fearlessly pushed on and applied Bayesian reconstruction techniques to the conceptually and computationally much more demanding problem of spatiotemporal reconstruction.

Three members of the group each watched about two hours’ worth of short takes from various Hollywood movies. These data were used to train a separate encoding model for each voxel. The first part of the model consisted of a bank of neural filters. These filters are based on the cumulative research that has been conducted over two decades into the way nerve cells in the visual cortex in people and monkeys respond to seeing visual stimuli with varying positions, size, motion and speed. The second part of the model coupled these neuronal filters to the blood vasculature, describing how the neuronal activity is reflected in much slower fMRI signals.

Next, they applied the same Bayesian framework to decode fMRI signals. They used 5,000 hours’ worth of short clips pulled at random from YouTube to establish a baseline of “natural movies.” The same three subjects were tested by watching movies in the magnet they had not previously seen and that were not drawn from the natural movies data set. The decoder estimates the most likely clip based on the response of many voxels in the visual cortex of each volunteer. It is a very sophisticated form of hedging one’s bets based on prior experience, widely used in a variety of applications—such as predicting that your credit card is being misused by somebody who has very different purchasing patterns.

Reconstructing the movie in the head leads to some stunning results. (I urge the reader to visit Gallant’s Web site, where a movie highlights the side-by-side comparison between viewed and decoded movies.) The method is far from perfect—the reconstructed clips are slow and lack details. After all, the fMRI signal is read out only once every second, whereas the underlying movies are much more dynamic (with a 15-hertz frame rate). Yet the net result is astounding, even for an old hand like me.

What Does the Future Hold?
As our measurement tools become more precise and our algorithms more sophisticated, the quality of the reconstructed movies will improve. Indeed, it is not inconceivable that the kind of visual daydreaming we all engage in—sexual fantasies, the crux of the climb where I keep on falling, what I should have told my boss—will one day yield to these tools (provided that I engage in imagery while lying completely immobile in a magnetic scanner). And who’s to say that dreams might not also be accessible to Gallant’s reconstruction techniques?

Functional brain imaging is perfectly safe and requires nothing more than reclining on one’s back uncomfortably for a few hours in a tight metal cylinder. Yet the fundamental spatiotemporal limits of fMRI remain. It does not access the atoms of perception, individual neurons. At the moment, only intrusive microelectrodes that are implanted in the brains of some patients, as was described in my May/June 2011 column, can access the substrate out of which our most fleeting experiences, thoughts and conscious memories arise. For now these remain safe from prying eyes.