epidemic models | monkey's uncle

We have a new paper in the Early Edition of PNAS on the ecology of plague in prairie dogs. The Stanford News Service did a nice little write-up of the paper (and Mark Shwartz's full version is available on the Woods Institute site) and it has now been picked up by a number of media outlets including USA Today, ScienceDaily, The Register (UK), as well as a couple of radio news shows. This paper has been a real pleasure for me because of my incredible collaborators. Dan Salkeld, who has been a post-doctoral fellow with me and now splits his time between teaching in Human Biology at Stanford and working as an epidemiologist for the California Department of Health, is the lead author. Dan is clearly one of the leading young disease ecologists working today and his understanding of the field and willingness to do the sometimes unglamorous grunt work of ecology in pursuit of important research questions continually impresses me. The paper uses data that he collected while he worked for co-author Paul Stapp on Paul and collaborators' plague project in the Pawnee National Grasslands in Colorado. Dan and Paul had the idea that grasshopper mice (see below) might have something to do with the episodic plague outbreaks in prairie dog towns. Apparently, this idea was met with skepticism by their colleagues. When Dan came to Stanford, I suggested that we could probably put together a model to test the hypothesis. While we were waiting for our research permits to come through for a project in Indonesia (also dealing with plague; another long story), we decided to take up the challenge. What really made the whole project come together was the fortuitous office-pairing of Dan with Marcel Salathé, another post-doc with whom I have collaborated extensively on questions of social networks and infectious disease. In addition to being a brilliant theoretical biologist, Marcel is an ace Java programmer. Following a few white-board sessions in the studio near our offices, Dan and Marcel put together an amazing computer simulation that achieves that perfect balance between simplicity and realism that allows for scientific insight.

I don't think anyone would have predicted this particular collaboration and this particular outcome. The results described in this paper come from an incredibly interdisciplinary collaboration. I am really struck at how great science can come from a few simple ingredients: (1) long-term ecological data collection facilitated by a visionary program at the National Science Foundation, (2) a space where people from quite different disciplines and with different scientific sensibilities can get together and brain-storm, (3) flexible funding that permits researchers to explore the interesting – if offbeat – scientific questions that arise from such interactions. So, I have many debts to acknowledge for this one. The field data come from the project for which Mike Antolin at Colorado State is the PI (out co-author Paul Stapp is a Co-PI for that as well). The funding source for this project was the joint NSF/NIH Ecology of Infectious Disease program. This is a cross-cutting program that "supports the development of predictive models and the discovery of principles governing the transmission dynamics of infectious disease agents" (from the EID home page). The space – both physical and intellectual – that permitted this work to happen was provided by the Woods Institute for the Environment. This paper literally came into being in the project studio on the third floor of Y2E2 in the Land Use and Conservation area. Amazingly, this is exactly what these studios were designed to do. My office in Y2E2 has adjoining office space for grad-students and post-docs and this is where Marcel and Dan did most of their hashing. It was always amusing to pop my head in and see them both huddled around a computer, having animated discussions about how best to represent the complex ecology in a computational model that is simple enough to understand and flexible enough to allow us to test hypotheses. Finally, funding. Dan was funded by a Woods Environmental Ventures Project grant for which I am the PI. Marcel was funded by the Branco Weiss Science in Society Fellowship. My own flexibility was assured by a career grant form the National Institutes of Health. Research funding is almost always important, but the requirements of research funding can sometimes be too constraining to permit exploration of really new ideas. All three of these mechanisms (Woods EVP, Branco Weiss, NIH K01) provide exactly the type of flexibility that fosters creativity. I wish there were more programs like these.

One of the fundamental questions in disease ecology is how extremely pathogenic infectious agents persist both through time and across landscapes. Plague is a bacterial disease that affects a wide range of rodents throughout the world and, in North America, particularly afflicts prairie dogs (Cynomys ludovicianus). Plague epizootics (the animal equivalent of epidemics in humans) are dramatic affairs with almost complete mortality of massive prairie dog ‘towns’ of thousands of animals. If plague is so deadly to prairie dogs, how does it persist? Is there another reservoir (i.e., an other host species that can maintain an infection in the absence of prairie dogs)? Does plague get into the soil and persist in some sort of suspended state (the way that some Mycobacteria do, for example) waiting to reinfect a re-colonized prairie dog town? Or is plague really enzootic (i.e., when an infection persists at low levels in an animal population) and we just haven’t detected it? This question has wide applicability. Consider diseases of people such as Ebola Hemorrhagic Fever or SARS, or, going back a few hundred years in human history, that nastiest of bacterial diseases, bubonic plague. Yes, the same beastie. A disease that killed a third of the population of Europe in the fourteenth century exists in prairie dogs in North America today (and sometimes spills over to produce human infections).

Prairie dogs are a keystone species of the grasslands of the American West. They are threatened by various anthropogenic forces, including habitat destruction and human persecution. But most importantly, prairie dog viability is threatened by plague.

Plague, a disease caused by the bacterium (Yersinia pestis) and the causative agent of Black Death, arrived in USA via San Francisco ca. 1900, and still infects (or threatens to infect) people each year, including in California. Plague killed as many as 200 million people in Medieval Europe. It is still important in Africa and Asia. There have been sizable epidemics as recently as the middle twentieth century in India and China and a substantial outbreak in Surat, India in 1994 that, in addition to death, caused widespread panic and social disruption.

Previous modeling and ecological work tended to assume that die-outs occur very rapidly. But questions dogged this work (as it were): were the apparently rapid die-offs simply an artifact of finally seeing dead dogs dropping all over the place? Prairie dogs do live underground, after all, and they live in enormous towns. Who would miss a few dead dogs underground in a town of thousands? Our paper suggests that previous modeling efforts get the story wrong. They fail to account for observed patterns because they missed key elements of the picture. Previous models that could describe the phenomena lacked an actual explanation – it’s a magical reservoir? It’s a carnivore? Certainly it’s something somewhere?

While prairie dogs live in enormous towns, they are highly territorial within the towns. Towns form because of the benefits of predator defense. They live in small family groups known as coteries, and these coteries form a more-or-less regular grid of small defended territories within the towns. Because of this regular structure induced by their territoriality, a directly-transmitted infectious disease can only move so quickly through a town since it could only be transmitted to immediate neighbors and each coterie only has a couple of these. Plague is not directly transmitted though. It is carried by flea vectors, but if the dispersal distance of a flea is less than the diameter of a coterie’s territory, then the transmissibility of this vector-borne disease is similar to something that is directly transmitted. Prairie dogs are territorial and this territoriality limits the rate of disease propagation through prairie dog towns. However, prairie dogs are not alone on their eponymous prairies.

Grasshopper mice – smelly, carnivorous mice, happy to eat through prairie dog carcasses – get swamped by fleas that normally live on prairie dogs. And grasshopper mice have no respect for prairie dog territories. They spread fleas across prairie dog coteries. This is the critical piece of the puzzle provided by our analysis. Grasshopper mice are the key amplifying hosts for plague in prairie dogs. Grasshopper mice increase the spread of disease by moving fleas across the landscape, similar to the way that highly promiscuous people may spread HIV or so-called 'super-spreders' transmitted SARS in the global outbreak of 2003. Of course, there are interesting differences between the plague model and these other diseases. Grasshopper mice are like super-spreaders in that they push the system over the percolation threshold. They are unlike super-spreaders in that they don’t have that many more contacts than the average – they just connect otherwise unconnected segments of a population already near the threshold of an epidemic.

Without grasshopper mice, plague still kills prairie dog families, one at a time, but it moves very slowly, and it is extremely hard to detect (who misses 5 dead prairie dogs in a colony that stretches for 200 hectares and has upwards of 5000 animals?). The grasshopper mice take a spatially-organized system that is on the verge of an epizootic and push it over the threshold. The term ‘percolation threshold’ in the title of our paper relates to a branch of theory from geophysics that explains how and when a fluid can pass through a porous random medium. This theory uses random graphs, which are the same mathematical structure that we use to model social networks, to understand when, for example, a medium will let water pass through it – i.e., to percolate. When the density of pores in, say, a layer of sandstone passes a critical density, water can pass from the surface through to recharge the aquifer. Similarly, when the density of susceptible prairie dog families crosses a critical threshold, plague can sweep through and wipe out a town of thousands of individuals. The spatial structure induced by prairie dog territoriality turns out, on average, to be not quite at the percolation threshold (though it’s close). What the grasshopper mice do is provide the critical connectivity that puts the system over the threshold and allows a slowly simmering enzootic infection turn into a full-blown epizootic.

It is in thinking about percolation thresholds that we see how important the behavior of affected species is for understanding disease dynamics. Plague in Asian great gerbils, while effectively modeled using the same mathematical formalism, only requires one species in order to achieve the percolation threshold. Because great gerbils roam more widely and mix more, what matters for plague epizootics in this species is simply overall gerbil density.

It seems quite likely that this pattern of diseases smoldering at low-level below the detection threshold before some dramatic occurrence brings them to general attention is common, particularly with emerging infections. For example, there is evidence for extensive transmission of H1N1 ‘swine flu’ in Mexico before a large number of deaths appeared seemingly quite suddenly in April of 2009. A number of other diseases – both of people and wildlife – show this pattern of being seemingly completely lethal, burning through host communities, and disappearing only to reappear some years later. Important examples include Ebola in both humans and gorillas, hantavirus in people, anthrax in zebra, or chytrid fungi and frogs.

What are the key take-home messages of this paper? There are five, as far as I see it: (1) plague is enzootic in prairie dogs and there is no need to posit an alternate reservoir, (2) this said, the transition from enzootic to epizootic infection in prairie dogs is mediated by grasshopper mice, (3) understanding disease ecology – including species interactions – is a key to understanding (and predicting) dynamics, (4) behavior matters for disease dynamics, and (5) epidemiological surveillance is essential for controlling infectious disease – just because you don't see a disease, doesn't mean it's not there!

I'm sure I'll have more to say about this. I did want to note that the publication of this paper coincides with a personnel transition here in our group at Stanford. Marcel has moved on to a faculty position, joining the spectacular Center for Infectious Disease Dynamics at Penn State. Peter Hudson and his crew have assembled an amazing and eclectic group of scientists in Happy Valley and kudos to them for landing Marcel. I frequently think that only a total fool would pass up an offer to join this exciting and productive group, but that's another story. I expect Marcel to do great things there and look forward to continued collaborations.

Every once in a while someone asks me for advice on the platform to use for developing models of infectious disease. I typically make the same recommendations -- unless the person asking has something very specific in mind. This happened again today and I figured I would turn it into a blog post.

The answer depends largely on (1) what types of models you want to run, (2) how comfortable you are with programming, and (3) what local resources (or lack thereof) you might have to help you when you inevitably get stuck. If you are not comfortable with programming and you want to stick to fairly basic compartmental models, then something like Stella or Berkeley Madonna would work just fine. There are a few books that provide guidance on developing models in these systems. I have a book by Hannon and Ruth that is ten years old now but, if memory serves me correctly, was a pretty good introduction both to STELLA and to ecological modeling. They have a slightly newer book as well. Models created in both systems appear in the primary scientific literature, which is always a good sign for the (scientific) utility of a piece of software. These graphical systems lack a great deal of flexibility and I personally find them cumbersome to use, but they match the cognitive style of many people quite nicely, I think, and probably serve as an excellent introduction to mathematical modeling.

Moving on to more powerful, general-purpose numerical software...

Based on my unscientific convenience sample, I'd say that most mathematical epidemiologists use Maple. Maple is extremely powerful software for doing symbolic calculations. I've tried Maple a few times but for whatever reason, it never clicked for me. Because I am mostly self-taught, the big obstacle for me using Maple has always been the lack of resources either print or internet for doing ecological/epidemiological models in this system. Evolutionary anthropologist Alan Rogers does have some excellent notes for doing population biology in Maple.

Mathematica has lots of advantages but, for the beginner, I think these are heavily outweighed by the start-ups costs (in terms of learning curve). I use Mathematica some and even took one of their courses (which was excellent if a little pricey), but I do think that Mathematica handles dynamic models in a rather clunky way. Linear algebra is worse. I would like Mathematica more if the notebook interface didn't seem so much like Microsoft Word. Other platforms (see below) either allow Emacs key bindings or can even be run through Emacs (this is not a great selling point for everyone, I realize, but given the likely audience for Mathematica, I have always been surprised by the interface). The real power of Mathematica comes from symbolic computation and some of the very neat and eclectic programming tools that are part of the Mathematica system. I suspect I will use Mathematica more as time goes on.

Matlab, for those comfortable with a degree of procedural-style programming, is probably the easiest platform to use to get into modeling. Again, based on my unscientific convenience sample, my sense is that most quantitative population biologists and demographers use Matlab. There are some excellent resources. For infectious disease modeling in particular, Keeling and Rohani have a relatively new book that contains extensive Matlab code. In population biology, books by Caswell and Morris and Doak, both contain extensive Matlab code. Matlab's routines for linear algebra and solving systems of differential equations are highly optimized so code is typically pretty fast and these calculations are relatively simple to perform. There is a option in the preferences that allows you to set Emacs key bindings. In fact, there is code that allows you to run Matlab from Emacs as a minor mode. Matlab is notably bad at dealing with actual data. For instance, you can't mix and match data types in a data frame (spreadsheet-like structure) very easily and forget about labeling columns of a data frame or rows and columns of a matrix. While its matrix capabilities are unrivaled, there is surprisingly little development of network models, a real growth area in infectious disease modeling. It would be really nice to have some capabilities in Matlab to import and export various network formats, thereby leveraging Matlab's terrific implementation of sparse matrix methods.

Perhaps not surprisingly, the best general tool, I think, is R. This is where the best network tools can be found (outside of pure Java). R packages for dealing with social networks include the statnet suite (sna, network, ergm), igraph, graph, blockmodeling, RGBL, etc. (the list goes on). It handles compartmental models in a manner similar to Matlab using the deSolve package, though I think Matlab is generally a little easier for this. One of the great things about R is that it makes it very easy to incorporate C or Fortran code. Keeling and Rohani's book also contains C++ and Fortran code for running their models (and such code is generally often available). R and Matlab are about equally easy/difficult (depending on how you see it) to learn. Matlab is somewhat better with numerically solving systems of differential equations and R is much better at dealing with data and modeling networks. R can be run through Emacs using ESS (Emacs Speaks Statistics). This gives you all the text-editing benefits of a state-of-the-art text editor plus an effectively unlimited buffer size. It can be very frustrating indeed to lose your early commands in a Matlab session only to realize that you forgot to turn on the diary function. No such worries when your run R through Emacs using ESS.

One of the greatest benefits of R is its massive online (and, increasingly, print publishing) help community. I think that this is how R really trumps all the other platforms to be the obvious choice for the autodidacts out there.

I moved from doing nearly all my work in Matlab to doing most work in R, with some in Mathematica and a little still in Matlab. These are all amazingly powerful tools. Ultimately, it's really a matter of taste and the availability of help resources that push people to use one particular tool as much as anything else. This whole discussion has been predicated on the notion that one wants to use numerical software. There are, of course, compelling reasons to use totally general programming tools like C, Java, or Python, but this route is definitely not for everyone, even among those who are interested in developing mathematical models.

monkey's uncle

Tag Archives: epidemic models

notes on human ecology, population, and infectious disease