# On the Uses of an Interdisciplinary Ph.D.

Today, I participated in a panel — along with super-smart colleagues Alex Konings and Kabir Peay — for the first-year Ph.D. students in the E-IPER program, an interdisciplinary, graduate interdepartmental program (IDP) at Stanford. As is the idiom for any E-IPER event, we spent a lot of time fretting about interdisciplinarity: what it means, how you achieve it, what costs it entails for jobs, etc.

I expressed the slightly heretical opinion that we should not pursue interdisciplinarity for interdisciplinarity’s sake. What matters — both in terms of the science and more instrumental outcomes such as getting published, getting a job, getting tenure — are questions. Yes, questions. One should ask important questions that people care about. Why are there so many species in the tropics? Where do pandemic diseases come from and how can we best control them? Does democracy and the rule of law provide the best approach to governance? How do people adapt to a changing climate?

Where the interdisciplinary Ph.D. program comes in is it provides students the opportunity to pursue whatever tools and approaches are required to answer the question in the best way possible. You don’t need to use a particular approach because that’s what people in your field do. Sometimes the best thing to do will be totally interdisciplinary; sometimes it will look a bit more like what someone in a disciplinary program would do. Always lead with the question.

Answering important questions using the best tools available is probably the best route to managing the greatest risk of an interdisciplinary degree. This risk, of course, is the difficulty in getting a job when you don’t look like what any given department had in mind when they wrote a job ad. The best way to manage this risk is simply to be excellent. If your work is strong enough, the specific discipline of your Ph.D. doesn’t really matter. Now, there are certainly some disciplines that are more xenophobic than others (anthropology and economics come immediately to mind), but if your work is really outstanding, the excuse that you don’t have the right degree for a given job gets much more tenuous. Two people who come immediately to mind are my colleague David Lobell and my sometime collaborator and former Stanford post-doc Marcel Salathé.

Is David a geographer? Geologist? Economist? Doesn’t really matter because he’s generally recognized as being a smart guy doing important work. Similarly with Marcel: population geneticist? Epidemiologist? Computer scientist? Who cares? He has important things to say and gets recognized for it.

Now, alas, we can’t all be David and Marcel, but we can strive to ask important scientific questions and let these questions lead us to both the skills and the bodies of knowledge we need. These then form the foundation of our research careers. Interdisciplinarity then is about following the question. It is not an end to itself.

# Seriously, People, It's Selection, Not Mutation!

I just read an excellent piece at Slate.com this morning by Benjamin Hale. He notes that the scariest, most insidious thing about Ebola Virus Disease is that the disease capitalizes on intimate contact for transmission. While diseases such as influenza or cholera are transmitted by casual contact, frequently to strangers, via aerosolized droplets (influenza) or fecally contaminated water (cholera). Caretakers, and especially women, are hit hard by EVD. Hale writes,

…the mechanism Ebola exploits is far more insidious. This virus preys on care and love, piggybacking on the deepest, most distinctively human virtues. Affected parties are almost all medical professionals and family members, snared by Ebola while in the business of caring for their fellow humans. More strikingly, 75 percent of Ebola victims are women, people who do much of the care work throughout Africa and the rest of the world. In short, Ebola parasitizes our humanity.

True, and tragic, enough. But this article falls prey to one of my biggest frustrations with the reporting of science, one that I have written about recently in the context of the current EVD epidemic ravaging West Africa.

In the list Hale presents of the major concerns about EVD, he notes: “The threat of mutation,” citing concern that Ebola virus might become airborne in a news report in Nature and the New York Times article that got me so worked up 10 days ago. Earlier this week, there was yet another longish piece in Nature/Scientific American that mentions “mutation” seven times but never once mentions selection. Or in another Nature piece,  UCSF infectious disease physician Charles Chiu is quoted: “The longer we allow the outbreak to continue, the greater the opportunity the virus has to mutate, and it’s possible that it will mutate into a form that would be an even greater threat than it is right now.” True, mutations accumulate over time. Not true, mutation alone will make Ebola virus a greater threat than it is now. That would require selection.

While the idea of airborne transmission of Ebola virus is terrifying, the development of the ability to be transmitted via droplet or aerosol would be an adaptation on the part of the virus. Adaptations arise from the action of selection on the phenotypic variation. Phenotypes with higher fitness come to dominate the population of entities of which they are a part. In the case of a virus such as Ebola virus, this means that the virus must make sufficient copies of itself to ensure transmission to new susceptible hosts before killing the current host or being cleared by the host’s immune system. While efficient transmission of EVD by aerosol or droplet would be horrible, equally horrible would be an adaptation that allowed it to transmit more efficiently from a dead host. It’s not entirely clear how long Ebola virus can persist in its infectious state in the environment. In a study designed to maximize its persistence (indoors, in the dark, under laboratory conditions), Sagripanti and colleagues found that Ebola virus can persist for six days. Under field conditions, it’s probably much shorter, but CDC suggests that 24 hours in a reasonably conservative estimate.

The lack of a strong relationship between host survival and pathogen transmission is why cholera can be so devastatingly pathogenic. The cholera patient can produce 10-20 liters of diarrhea (known as “rice water stools”) per day. These stools contain billions of Vibrio cholerae bacteria, which enter the water supply and can infect other people at a distance well after the original host has died. The breaking of the trade-off between host mortality and the transmissibility of the pathogen means that the natural break on virulence is removed and the case fatality ratio can exceed 50%. That’s high, kind of like the current round of EVD. Imagine if the trade-off between mortality and transmission in EVD were completely broken…

Changes in pathogen life histories like increased (or decreased) virulence or mode of transmission arise because of selection, not mutation, and this selection results from interactions with an environment that we are actively shaping. Sure, mutation matters because it provides raw material upon which selection can act, but the fact remains that we are talking primarily about selection here. Is this pervasive misunderstanding of the mechanisms of life the result of the war of misinformation being waged on science education in the US? I can’t help but think it must at least be a contributor, but if it’s true, it’s pretty depressing because this misunderstanding is finding its way to some of the world’s top news and opinion outlets.

# Selection is What Matters

This has to be a quick one, but I wanted to go on the record is noting my frustration at the current concern that Ebola might “mutate” into something far worse, like a pathogen that is efficiently transmitted by aerosol. For example, Michael Osterholm wrote in the New York Times yesterday, “The second possibility is one that virologists are loath to discuss openly but are definitely considering in private: that an Ebola virus could mutate to become transmissible through the air.”  I heard Morning Edition host David Greene ask WHO Director Margaret Chan last week, “Is this virus mutating in a way that could be very dangerous, that could make it spread faster?”

I agree, Ebola Virus becoming more easily transmitted by casual contact would be a ‘nightmare scenario.’ However, what we need to worry about is not mutation per se, but selection! Yes, the virus is mutating. It’s a thing that viruses do. Ebola Virus is a Filovirus. It is composed of a single strand of negative-sense RNA. Like other viruses, and particularly RNA viruses, it is prone to high mutation rates. This is exacerbated by the fact that RNA polymerases lack the ability to correct mistakes. So mutations happen fast and they don’t get cleaned up. Viruses also have very short generation times and can produce prodigious copies of themselves. This means that there is lots of raw material on which selection can act, because variation is the foundation of selection. Add to that heritability, which pretty much goes without saying since we are talking about the raw material of genetic information here, and differential transmission success and voilà, selection!

And virulence certainly responds to selection. There is a large literature on experimental evolution of virulence. See for example the many citations at the linked to Ebert’s (1998) review in Science here. There are lots of different specific factors that can favor the evolution of greater or lesser virulence and this is where theoretical biology can come in and make some sense of things. Steve Frank wrote a terrific review paper in 1996, available on his website, that describes many different models for the evolution of virulence. Two interesting regularities in the the evolution of virulence may be relevant to the current outbreak of EVD in West Africa. The first comes from a model developed by van Baalen & Sabelis (1995). Noting that there is an inherent trade-off between transmissibility of a pathogen and the extent of disease-induced mortality that it causes (a virus that makes more copies of itself is more likely to be transmitted but more viral copies means the host is sicker and might die), they demonstrate that when the relative transmissibility of a pathogen declines, its virulence will increase. They present a marginal value theorem solution for optimal virulence, which we can represent graphically in the figure below. Equilibrium virulence occurs where a line, rooted at the origin, is tangent to the curve relating transmissibility to disease-induced mortality. When the curve  is shifted down, the equilibrium mortality increases. EVD is a zoonosis and it’s reasonable to think that when it makes the episodic jump into human populations, it is leaving the reservoir species the biology of which it is adapted to and entering a novel species to which it is not adapted. Transmission efficiency very plausibly would decrease in such a case and we would expect higher virulence.

The second generality that may be of interest for EVD is discussed by Paul Ewald in his book on the evolution of infectious disease and (1998) paper. Ewald notes that when pathogens are released of the constraint between transmissibility and mortality — that is, when being really sick (or even dead) does not necessarily detract from transmission of the pathogen — then virulence can increase largely without bound. Ewald uses the difference in virulence between waterborne  and directly-transmitted pathogens to demonstrate this effect. At first glance, this seems to contradict the van Baalen & Sabelis model, but it doesn’t really. The constraint is represented by the curve in the above figure. When that constraint is released, the downward-sloping curve becomes a straight line (or maybe even an upward-sloping curve) and transmissibility continues to increase with mortality. There is no intermediate optimum, as predicted by the MVT, so virulence increases to the point where host mortality is very high.

A hemorrhagic fever, EVD is highly transmissible in the secretions (i.e., blood, vomit, stool) of infected people. Because these fluids can be voluminous and because so many of the cases in any EVD outbreak are healthcare workers, family members, and attendants to the ill, we might imagine that the constraints between transmissibility and disease-induced mortality on the Ebola Virus could be released, at least early in an outbreak. As behavior changes over the course of an outbreak — both because of public health interventions and other autochthonous adaptations to the disease conditions — these constraints become reinforced and selection for high-virulence strains is reduced.

These are some theoretically-informed speculations about the relevance of selection on virulence in the context of EVD. The reality is that while the theoretical models are often supported by experimental evidence, the devil is always in the details, as noted by Ebert & Bull (2003). One thing is certain, however. We will not make progress in our understanding of this horrifying and rapidly changing epidemic if all we are worried about is the virus mutating.

Selection is overwhelmingly the most powerful force shaping evolution. The selective regimes that pathogens face are affected by the physical and biotic environments in which pathogens are embedded. Critically, they are also shaped by host behavior. In the case of the current West African epidemic of EVD, the host behavior in question is that of many millions of people at risk, their governments, aid organizations, and the global community. People have a enormous potential to shape the selective regime that will, in turn, shape the pathogen that will infect future victims. This is what we need to be worrying about, not whether the virus will mutate. It saddens and frustrates me that we live in a country where evolution is so profoundly misunderstood that even our most esteemed, and otherwise outstanding sources of information and opinion don’t understand the way nature works and the way that human agency can change its workings for our benefit or detriment.

# My Erdős Number

Paul Erdős was the great peripatetic, and highly prolific, mathematician of the 20th century. A terrific web page run by Jerry Grossman at Oakland University provides details of the Erdős Project. Erdős was a pioneer in graph theory, which provides the formal tools for the analysis of social networks.  A collaboration graph is a special graph in which the nodes are authors and an edge connects authors if they co-author a publication. Erdős was such a prolific collaborator that he forms a major hub in the mathematics collaboration graph, linking many disparate authors in the different realms of pure and applied mathematics.

For whatever reason, today I used Grossman’s directions for finding one’s number. <drum roll> My Erdős number is 4.  The path that leads me to Erdős is pretty sweet, I have to say.  This past year, I published a paper in PNAS with Marc Feldman.  Marc wrote a number of papers (here’s one) with Sam Karlin (who, I’m proud to say, came and slept through at least one talk I gave at the Morrison Institute). Karlin wrote a paper with Gábor Szegő, who wrote a paper with Erdős.  Lots of Stanford greatness there that I feel privileged to be a part of. It turns out that I have independent (though longer) paths through my co-authors Marcel Salathé and Mark Handcock as well.

# Uncertainty and Fat Tails

A major challenge in science writing is how to effectively communicate real, scientific uncertainty.  Sometimes we just don’t know have enough information to make accurate predictions.  This is particularly problematic in the case of rare events in which the potential range of outcomes is highly variable. Two topics that are close to my heart come to mind immediately as examples of this problem: (1) understanding the consequences of global warming and (2) predicting the outcome of the emerging A(H1N1) “swine flu” influenza-A virus.

Harvard economist Martin Weitzman has written about the economics of catastrophic climate change (something I have discussed before).  When you want to calculate the expected cost or benefit of some fundamentally uncertain event, you basically take the probabilities of the different outcomes and multiply them by the utilities (or disutilities) and then sum them.  This gives you the expected value across your range of uncertainty.  Weitzman has noted that we have a profound amount of structural uncertainty (i.e., there is little we can do to become more certain on some of the central issues) regarding climate change.  He argues that this creates “fat-tailed” distributions of the climatic outcomes (i.e., the disutilities in question).  That is, the probability of extreme outcomes (read: end of the world as we know it) has a probability that, while it’s low, isn’t as low as might make us comfortable.

A very similar set of circumstances besets predicting the severity of the current outbreak of swine flu.  There is a distribution of possible outcomes.  Some have high probability; some have low.  Some are really bad; some less so.  When we plan public health and other logistical responses we need to be prepared for the extreme events that are still not impossibly unlikely.

So we have some range of outcomes (e.g., the number of degrees C that the planet warms in the next 100 years or the number of people who become infected with swine flu in the next year) and we have a measure of probability associated with each possible value in this range. Some outcomes are more likely and some are less.  Rare events are, by definition, unlikely but they are not impossible.  In fact, given enough time, most rare events are inevitable.  From a predictive standpoint, the problem with rare events is that they’re, well, rare.  Since you don’t see rare events very often, it’s hard to say with any certainty how likely they actually are.  It is this uncertainty that fattens up the tails of our probability distributions.  Say there are two rare events.  One has a probability of $10^{-6}$ and the other has a probability of $10^{-9}$. The latter is certainly much more rare than the former. You are nonetheless very, very unlikely to ever witness either event so how can you make any judgement that the one is a 1000 times more likely than the other?

Say we have a variable that is normally distributed.  This is the canonical and ubiquitous bell-shaped distribution that arises when many independent factors contribute to the outcome. It’s not necessarily the best distribution to model the type of outcomes we are interested in but it has the tremendous advantage of familiarity. The normal distribution has two parameters: the mean ($\mu$) and the standard deviation ($\sigma$).  If we know $\mu$ and $\sigma$ exactly, then we know lots of things about the value of the next observation.  For instance, we know that the most likely value is actually $\mu$ and we can be 95% certain that the value will fall between about -1.96 and 1.96.

Of course, in real scientific applications we almost never know the parameters of a distribution with certainty.  What happens to our prediction when we are uncertain about the parameters? Given some set of data that we have collected (call it $y$) and from which we can estimate our two normal parameters $\mu$ and $\sigma$, we want to predict the value of some as-yet observed data (which we call $\tilde{y}$).  We can predict the value of $\tilde{y}$ using a device known as the posterior predictive distribution.  Essentially, we average our best estimates across all the uncertainty that we have in our data. We can write this as

OK, what does that mean? $p(y|\mu,\sigma)$ is the probability of the data, given the values of the two parameters.  This is known as the likelihood of the data. $p(\mu,\sigma|y)$ is the probability of the two parameters given the observed data.  The two integrals mean that we are averaging the product $p(y|\mu,\sigma)p(\mu,\sigma|y)$ across the range of uncertainty in our two parameters (in statistical parlance, “integrating” simply means averaging).

If you’ve hummed your way through these last couple paragraphs, no worries.  What really matters are the consequences of this averaging.

When we do this for a normal distribution with unknown standard deviation, it turns out that we get a t-distribution.  t-distributions are characterized by “fat tails.” This doesn’t mean they look like this. What it means is that the probabilities of unlikely events aren’t as unlikely as we might be comfortable with.  The probability in the tail(s) of the distribution approach zero more slowly than an exponential decay.  This means that there is non-zero probability on very extreme events. Here I plot a standard normal distribution in the solid line and a t-distribution with 2 (dashed) and 20 (dotted) degrees of freedom.

We can see that the dashed and dotted curves have much higher probabilities at the extreme values.  Remember that 95% of the normal observations will be between -1.96 and 1.96, whereas the dashed line is still pretty high for outcome values beyond 4.  In fact, for the dashed curve,  95% of the values fall between -4.3 and 4.3. In all fairness, this is a pretty uncertain distribution, but you can see the same thing with the dotted line (where the 95% internal interval is plus/minus 2.09).  Unfortunately, when we are faced with the types of structural uncertainty we have in events of interest like the outcome of global climate change or an emerging epidemic, our predictive distributions are going to be more like the very fat-tailed distribution represented by the dashed line.

As scientists with an interest in policy, how do we communicate this type of uncertainty? It is a very difficult question.  The good news about the current outbreak of swine flu is that it seems to be fizzling in the northern hemisphere. Despite the rapid spread of the novel flu strain, sustained person-to-person transmission is not occurring in most parts of the northern hemisphere. This is not surprising since we are already past flu season.  However, as I wrote yesterday, it seems well within the realm of possibility that the southern hemisphere will be slammed by this flu during the austral winter and that it will come right back here in the north with the start of our own flu season next winter.  What I worry about is that all the hype followed by a modest outbreak in the short-term will cause people to become inured to public health warnings and predictions of potentially dire outcomes. I don’t suppose that it will occur to people that the public health measures undertaken to control this current outbreak actually worked (fingers crossed).  I think this might be a slightly different issue in the communication of science but it is clearly tied up in this fundamental problem of how to communicate uncertainty.  Lots to think about, but maybe I should get back to actually analyzing the volumes of data we have gathered from our survey!