# Uncertainty and Fat Tails

A major challenge in science writing is how to effectively communicate real, scientific uncertainty.  Sometimes we just don’t know have enough information to make accurate predictions.  This is particularly problematic in the case of rare events in which the potential range of outcomes is highly variable. Two topics that are close to my heart come to mind immediately as examples of this problem: (1) understanding the consequences of global warming and (2) predicting the outcome of the emerging A(H1N1) “swine flu” influenza-A virus.

Harvard economist Martin Weitzman has written about the economics of catastrophic climate change (something I have discussed before).  When you want to calculate the expected cost or benefit of some fundamentally uncertain event, you basically take the probabilities of the different outcomes and multiply them by the utilities (or disutilities) and then sum them.  This gives you the expected value across your range of uncertainty.  Weitzman has noted that we have a profound amount of structural uncertainty (i.e., there is little we can do to become more certain on some of the central issues) regarding climate change.  He argues that this creates “fat-tailed” distributions of the climatic outcomes (i.e., the disutilities in question).  That is, the probability of extreme outcomes (read: end of the world as we know it) has a probability that, while it’s low, isn’t as low as might make us comfortable.

A very similar set of circumstances besets predicting the severity of the current outbreak of swine flu.  There is a distribution of possible outcomes.  Some have high probability; some have low.  Some are really bad; some less so.  When we plan public health and other logistical responses we need to be prepared for the extreme events that are still not impossibly unlikely.

So we have some range of outcomes (e.g., the number of degrees C that the planet warms in the next 100 years or the number of people who become infected with swine flu in the next year) and we have a measure of probability associated with each possible value in this range. Some outcomes are more likely and some are less.  Rare events are, by definition, unlikely but they are not impossible.  In fact, given enough time, most rare events are inevitable.  From a predictive standpoint, the problem with rare events is that they’re, well, rare.  Since you don’t see rare events very often, it’s hard to say with any certainty how likely they actually are.  It is this uncertainty that fattens up the tails of our probability distributions.  Say there are two rare events.  One has a probability of $10^{-6}$ and the other has a probability of $10^{-9}$. The latter is certainly much more rare than the former. You are nonetheless very, very unlikely to ever witness either event so how can you make any judgement that the one is a 1000 times more likely than the other?

Say we have a variable that is normally distributed.  This is the canonical and ubiquitous bell-shaped distribution that arises when many independent factors contribute to the outcome. It’s not necessarily the best distribution to model the type of outcomes we are interested in but it has the tremendous advantage of familiarity. The normal distribution has two parameters: the mean ($\mu$) and the standard deviation ($\sigma$).  If we know $\mu$ and $\sigma$ exactly, then we know lots of things about the value of the next observation.  For instance, we know that the most likely value is actually $\mu$ and we can be 95% certain that the value will fall between about -1.96 and 1.96.

Of course, in real scientific applications we almost never know the parameters of a distribution with certainty.  What happens to our prediction when we are uncertain about the parameters? Given some set of data that we have collected (call it $y$) and from which we can estimate our two normal parameters $\mu$ and $\sigma$, we want to predict the value of some as-yet observed data (which we call $\tilde{y}$).  We can predict the value of $\tilde{y}$ using a device known as the posterior predictive distribution.  Essentially, we average our best estimates across all the uncertainty that we have in our data. We can write this as

OK, what does that mean? $p(y|\mu,\sigma)$ is the probability of the data, given the values of the two parameters.  This is known as the likelihood of the data. $p(\mu,\sigma|y)$ is the probability of the two parameters given the observed data.  The two integrals mean that we are averaging the product $p(y|\mu,\sigma)p(\mu,\sigma|y)$ across the range of uncertainty in our two parameters (in statistical parlance, “integrating” simply means averaging).

If you’ve hummed your way through these last couple paragraphs, no worries.  What really matters are the consequences of this averaging.

When we do this for a normal distribution with unknown standard deviation, it turns out that we get a t-distribution.  t-distributions are characterized by “fat tails.” This doesn’t mean they look like this. What it means is that the probabilities of unlikely events aren’t as unlikely as we might be comfortable with.  The probability in the tail(s) of the distribution approach zero more slowly than an exponential decay.  This means that there is non-zero probability on very extreme events. Here I plot a standard normal distribution in the solid line and a t-distribution with 2 (dashed) and 20 (dotted) degrees of freedom.

We can see that the dashed and dotted curves have much higher probabilities at the extreme values.  Remember that 95% of the normal observations will be between -1.96 and 1.96, whereas the dashed line is still pretty high for outcome values beyond 4.  In fact, for the dashed curve,  95% of the values fall between -4.3 and 4.3. In all fairness, this is a pretty uncertain distribution, but you can see the same thing with the dotted line (where the 95% internal interval is plus/minus 2.09).  Unfortunately, when we are faced with the types of structural uncertainty we have in events of interest like the outcome of global climate change or an emerging epidemic, our predictive distributions are going to be more like the very fat-tailed distribution represented by the dashed line.

As scientists with an interest in policy, how do we communicate this type of uncertainty? It is a very difficult question.  The good news about the current outbreak of swine flu is that it seems to be fizzling in the northern hemisphere. Despite the rapid spread of the novel flu strain, sustained person-to-person transmission is not occurring in most parts of the northern hemisphere. This is not surprising since we are already past flu season.  However, as I wrote yesterday, it seems well within the realm of possibility that the southern hemisphere will be slammed by this flu during the austral winter and that it will come right back here in the north with the start of our own flu season next winter.  What I worry about is that all the hype followed by a modest outbreak in the short-term will cause people to become inured to public health warnings and predictions of potentially dire outcomes. I don’t suppose that it will occur to people that the public health measures undertaken to control this current outbreak actually worked (fingers crossed).  I think this might be a slightly different issue in the communication of science but it is clearly tied up in this fundamental problem of how to communicate uncertainty.  Lots to think about, but maybe I should get back to actually analyzing the volumes of data we have gathered from our survey!

# On Swine Flu

A lot has happened in the last week.  I was frantically preparing for a big talk that I had to give at the end of the week when the news about swine flu started heating up.  As of the most recent posting from the Pan-American Health Organization, there are 1118 confirmed cases and 27 deaths in 18 countries worldwide. The United States has had 279 laboratory-confirmed cases and one death.  As I write, it sounds like the epidemiological situation is much better than it could have been.

But last Monday things were looking like they were going to get very serious. While I should have been preparing for my talk, I spent the day working out the details for an internet-based survey on people’s knowledge, attitudes, and behavior regarding the emerging H1N1 (a.k.a. “swine”) flu. Marcel Salathé and I realized that we had an historic opportunity to field a survey and learn something about people’s responses as the public health emergency unfolded.  Our survey has been online since the morning of Wednesday, April 29th (and is available here for anyone interested in taking it — it’s only 16 questions long and takes less than five minutes to complete) and we have gotten well over 5,000 responses so far. Vijoy Abraham at the Institute for Research in the Social Sciences was amazingly helpful in helping us get the survey online in a hurry, and IRiSS very kindly hosted the survey.

Marcel posted a nice blog piece on the survey and it was later picked up by Carl Zimmer on his blog, and then the big-time: we got written about on boing boing.  This idea clearly resonates with people.  Stanford put out a press release and now Marcel and I have done a number of interviews for various local and international media outlets (links forthcoming).  All this before we’ve even done any analysis!

We will keep the survey online as long as it is relevant, though we will begin doing some exploratory data analysis shortly. The thing about flu is that, even though it seems like it is fizzling out already, it could actually kick around for months. I was speaking with a prominent disease ecologist this past week who predicted that this particular outbreak would fizzle in the northern hemisphere for the time-being. You see, by May, we are pretty well past flu season in the north.  For whatever reason, flu shows marked seasonality in transmission.  Jeff Shaman at Oregon State has shown pretty convincingly that this seasonality is a matter of absolute humidity, which is lowest in the winter in temperate regions and is presumably more conducive for influenza transmission and virus survival. The disease ecologist who made the prediction for northern-hemisphere fizzle also suggested that the southern hemisphere might be in for a hard flu season during the austral winter.  Extensive and sustained transmission could be bad news for those of us who feel like we’ve dodged a bullet here in the north because when flu season rolls back around here, we might get slammed on the rebound.  A very interesting paper by Cécile Viboud and colleagues shows that it was the second influenza season that had the higher mortality rates during the last influenza pandemic of 1968.  The moderator on ProMED-mail wrote “Even if the present A/H1N1 has pandemic potential it is therefore highly likely that the outbreak will fade out within the next 2 to 3 weeks, but it will reappear in the autumn.”  Time to get cracking on getting this H1N1 strain incorporated into the next flu vaccine, I’d say.

This means that we will probably need to keep our survey up for a while to come.  It will be interesting — and hopefully informative — to see how people’s anxieties and knowledge about swine flu wax and wane as this system evolves.

That’s all for now though I suspect this won’t be the last post I write about swine flu. Oh, and the talk turned out fine; thanks for asking…