Science —

Linking correlation to causation with power laws and scale free systems

Power laws can help identify correlations between two sets of data. But some …

A scale-free network's growth can be described using a power law. But does that tell us anything about the network?
A scale-free network's growth can be described using a power law. But does that tell us anything about the network?

An essential part of science involves finding correlations between two sets of measurements and seeking explanations for those correlations. However, relationships can be suggested by data even when they don't actually exist, and correlations may occur due to random fluctuations rather than a deep underlying principle (as the infamous "correlation does not equal causation" cliché suggests). These errors are easy to make, and the scientific literature is full of them.

So how can researchers establish if a correlation is both real and meaningful? In a Perspective in the February 10 issue of Science, Michael P.H. Stumpf and Mason A. Porter examine the type of correlation known as a power law, where one set of measurements is related to a second via an exponent. They argue that two things must be in place for a power law to be valid as a predictive model: it must hold over a wide range of data to eliminate chance associations, and it must have a plausible mechanism to explain why the correlation showed up in the data.

A power law associates two types of measurements through an exponent. So if your measurements are x and y, a power law states that y is proportional to xp, where p is called the exponent of the power law. A simple example is the relationship between the surface area of a sphere and its radius: the area increases with the square of the radius, so y in this case is area, x is radius, and p is equal to 2.

A more sophisticated case is Kepler's Third Law, in which the length of time for an object to orbit the Sun is proportional to its average distance from the Sun raised to the power p=3/2. This example passes the criteria laid out by Stumpf and Porter: from Mercury to Sedna, Kepler's Third Law holds over a wide range of orbital periods and sizes. It also has a powerful mechanistic explanation in the form of Newton's Law of Gravitation. 

The model the authors discuss in the article is the relationship between the body size of an organism and its metabolic performance, a power law known as allometric scaling. Not only is there an obvious correlation across all organisms from microbes to the largest animals, but there are also clear reasons for the existence of such a relationship.

Power laws are scale-free by their mathematical nature, meaning they should hold equally true for large x and for small. But sometimes it's not possible to obtain data across the entire range of scale. For Kepler's Third Law, there are obvious practical end points, both close to the Sun and sufficiently far away. Similarly, there is a lower limit to how small organisms can be. However, none of these limits destroy the overall effectiveness of the underlying models because they still hold good over a sufficiently wide range of data.

As Stumpf and Porter point out, the problem becomes complicated when the data does not cover a wide range. When there isn't enough data available (as with many biological systems) or a lot of noise is present in the system, more than one power law may be fitted to the same data. Another problem is the natural human propensity to see patterns whether they exist or not: a set of data may appear to lie along a line, but that doesn't mean a significant correlation actually exists. 

Various statistical techniques do exist to determine whether a relationship between variables is real, and these go beyond simple tests about whether a set of data points fit a line. After all, a line can be drawn through any set of points, but that is no guarantee that line is meaningful. 

Additional information is useful in determining whether a line should be drawn in the first place. This is why both a sophisticated mechanistic explanation (the "why" of the power-law connection) and rigorous mathematical tests are necessary. As Stumpf and Porter point out, power laws may arise in random data whenever the distribution is heavily skewed to one side (rather than symmetrical, as in a Gaussian "bell curve" distribution). Even when correlation inarguably exists, a mechanistic explanation helps separate the spurious ones from those we need to consider more carefully.

With such stringent criteria, very few power laws are both real and predictive, according to the authors. Claims about "scale-free" networks (including the Internet) have been hyped, but are not statistically rigorous and do not hold up under mathematical scrutiny. Even when they're accurate, power-law treatments of complex systems may not even be useful from a theoretical perspective. 

All in all, discovery of a "universal" law within the data may be an artifact of imagination, so it's good to add a healthy dose of skepticism before such claims are presented to the public as revelations.

Science, 2012. DOI: 10.1126/science.1216142  (About DOIs).

Listing image by Photograph by Regan group, harvard.edu

Channel Ars Technica