Bill O'Reilly Discovers New Demographic Principle

So, either Bill O'Reilly is either on to something profound or he simply doesn't understand averages.  Methinks it's the latter. It's actually a pretty common problem -- people not understanding life expectancy -- it's just that O'Reilly boffs it in such spectacular form! Mathematical expectation is a fancy term for taking an average. You know, sum up the values and divide by the number of cases? In the case of life expectancy, the thing that we're averaging over is the number of years lived by members of a well-defined population. Mechanistically, when we calculate the life expectancy at age x we take all the person-years still to be lived by this group of people and divide by the number who started at age x.  The United States has ten times the number of deaths from accidents, etc. as Canada, but it also has ten times the number of people starting at age zero. Oops. I also love the fact that he seems to be attributing the differences in life expectancy between Canada and the United States not to differences in our health care systems but in societal pathologies like more accidents and crime in the United States!

Chimpanzees and Biomedical Research

I have now been asked a perfectly reasonable question that arises from our recent paper on chimpanzee "AIDS" several times (see previous entry). The question is, should we reinvigorate biomedical testing of SIV infection in chimpanzees as a model for HIV?  The simple answer is no. There are several compelling reasons for this.

First, there are the ethical considerations.  Given the genetic and phylogenetic closeness of chimpanzees to humans and their complex psychology and social behavior, the use of chimpanzees in experimental medical studies is not an ethically viable practice. Second, there is the legal fact that chimpanzees are an endangered species and therefore protected from such uses by international law. Third, is the simple economic argument.  Chimpanzees are too expensive to maintain for the types of insights biomedical research on them is likely to yield.  Fourth, they are impractical as an animal model.  Chimpanzees live for 40 years or more in captivity and the time course of SIV pathology -- while still not entirely understood -- is certainly slow (remember the lenti- in lentivirus means slow).  Remember, the surprising thing about our paper is that no one noticed AIDS-like pathology either in the wild or in experimentally infected chimps in captivity.  Finally, we have much better animal models.  SIVmac infection of rhesus monkeys provides an excellent model of infection and pathogenesis.  Rhesus macaques are much less long-lived, are not endangered, have a much shorter time-course of pathogenesis, and their social systems make them far easier to manage in captivity. Of course, parallel ethical arguments can be used against biomedical research on any primate species as those for chimpanzees, but I'm going to duck that for the time being since this is a post on chimpanzees.

That's the take on invasive, laboratory-based, biomedical research. Naturalistic studies of SIVcpz are another story altogether. We strongly believe that field studies of naturally infected ape populations (western lowland gorillas also have naturally acquired SIV infection) should be expanded. These studies would help us understand HIV pathogenesis by providing fecal, urine, and post-mortem samples for virological and immunological analysis. With regard to post-mortem samples, it is particularly important to have a constant field presence.  Even when the mortality hazard is 16 times the baseline, death is a relatively rare event and bodies disappear amazingly fast in tropical forests (perhaps I'll post some day about the maddening difficulty of trying to collect feces in a rainforest).

Our results open up an opportunity to compare pathogenesis in two closely related species. We hope that this will accelerate the identification of both viral and host factors responsible for disease progression.  This, in turn, could lead to the development of novel therapeutic or preventative measures that have the potential to benefit both chimpanzees and humans. I am particularly sanguine about the possibilities of finding host factors that are protective against infection.  With my former post-doc and current NCEAS scholar, Sadie Ryan, I have been doing some analysis of chimpanzee sexual networks. I don't want to spill the beans here before we submit the paper, but suffice it to say, there is a lot of exposure to SIVcpz in chimpanzee sexual networks.  This shouldn't be surprising to anyone who has even a passing familiarity with chimpanzee mating systems, but the formalization makes it particularly striking.

Why Use R?

An anthropologist colleague who did a post-doc in a population center has been trying to get a group of people at his university together to think about population issues.  This is something I'm all for and am happy to help in whatever little way I can to facilitate especially anthropologists developing their expertise in demography.  One of the activities they have planned for this population interest group is a workshop on the R statistical programming language. The other day he wrote me with the following very reasonable question that has been put to him by several of the people in his group: Sure R is free but other than that why should someone bother to learn new software when there is perfectly acceptable commercial software out there?  This question is particularly relevant when one works for an institution like a university where there are typically site licenses and other mechanisms for subsidizing the expense of commercial software (which can be substantial).  What follows is, more or less, what I said to him.

I should start out by saying that there is a lot to be said for free. I pay several hundred dollars a year for commercial software that I don't actually use that often. Now, when I need it, it's certainly nice to know it's there but if I didn't have a research account paying for this software, I might let at least one or two of these licenses slide.  I very occasionally use Stata because the R package that does generalized linear mixed models has had a bug in the routine that fits logistic mixed models and this is something that Stata does quite well. So I regularly get mailings about updates and I am always just blown away at the expense involved in maintaining the most current version of this software, particularly when you used the intercooled version.  It's relatively frequently updated (a good thing) but these updates are expensive (a bad thing for people without generous institutional subsidies). So, let me just start by saying that free is good.

This actually brings up a bit of a pet peeve of mine regarding training in US population centers.  We have these generous programs to train population scientists and policy-makers from the poor countries of the world.  We bring them into our American universities and train them in demographic and statistical methods on machines run by proprietary (and expensive!) operating systems and using extremely expensive proprietary software.  These future leaders will graduate and go back home to Africa, Asia, eastern Europe, or Latin America. There, they probably won't have access to computers with the latest hardware running the most recent software.  Most of their institutions can't afford expensive site licenses to the software that was on every lab machine back at Princeton or UCLA or Michigan or [fill in your school's name here]. This makes it all the more challenging to do the work that they were trained to do and leaves them just that much more behind scholars in advanced industrial nations.  If our population centers had labs with computers running Linux, taught statistics and numerical methods using R, and had students write LaTeX papers, lecture slides, and meeting posters using, say, Emacs rather than some bloated word-processor whose menu structure seems to change every release, then I think we would be doing a real service to the future population leaders of the developing world. But let's return to the question at hand, other than the fact that it's free -- which isn't such an issue for someone with a funded lab at an American University -- why should anyone take the trouble to learn R? I can think of seven reasons off the top of my head.

(1) R is what is used by the majority of academic statisticians.  This is where new developments are going to be implemented and, perhaps more importantly, when you seek help from a statistician or collaborate with one, you are in a much better position to benefit from the interaction if you share a common language.

(2) R is effectively platform independent.  If you live in an all-windows environment, this may not be such a big deal but for those of us who use Linux/Mac and work with people who use windows, it's a tremendous advantage.

(3) R has unrivaled help resources.  There is absolutely nothing like it.  First, the single best statistics book ever is written for R (Venables & Ripley, Modern Applied Statistics in S -- remember R is a dialect of S).  Second, there are all the many online help resources both from r-project.org and from many specific lists and interest groups. Third, there are proliferating publications of excellent quality. For example, there is the new Use R series. The quantity and quality of help resources is not even close to matched by any other statistics application.  Part of the nature of R -- community constructed, free software -- means that the developers and power users are going to be more willing to provide help through lists, etc. than someone in a commercial software company. The quality and quantity of help for R is particularly relevant when one is trying to teach oneself a new technique of statistical method.

(4) R makes the best graphics. Full stop. I use R, Matlab, and Mathematica.  The latter two applications have a well-deserved reputation for making great graphics, but I think that R is best.  I quite regularly will do a calculation in Matlab and export the results to R to make the figure.  The level of fine control, the intuitiveness of the command syntax (cf. Matlab!), and the general quality of drivers, etc. make R the hands-down best.  And let's face it, cool graphics sell papers to reviewers, editors, etc.

(5) The command-line interface -- perhaps counterintuitively -- is much, much better for teaching.  You can post your code exactly and students can reproduce your work exactly.  Learning then comes from tinkering. Now, both Stata and SAS allow for doing everything from the command line with scripts like do-files.  But how many people really do that?  And SPSS...

(6) R is more than a statistics application.  It is a full programming language. It is designed to seamlessly incorporate compiled code (like C or Fortran) which gives you all the benefits of a interactive language while allowing you to capitalize on the speed of compiled code.

(7) The online distribution system beats anything out there.

Oh, and let's face it, all the cool kids use it...

New Publication: Chimpanzee "AIDS"

keele_etal2009-first-pageA long-anticipated paper (by me anyway!) has finally been published in this week's issue of Nature.  In this paper, we show that wild chimpanzees living in the Gombe National Park in western Tanzania on the shores of Lake Tanganyika appear to die from AIDS-like illness when infected with the Simian Immunodeficiency Virus (SIV).  Many African primates harbor their own species-specific strain of SIV and chimpanzees are no exception.  The host species for a particular SIV strain is indicated by a three letter abbreviation (all in lower-case) following the all-caps SIV. So, for chimpanzees, the strain is called SIVcpz. It turns out that there are two distinct HIVs, known as HIV-1 and HIV-2. HIV-1 is the virus that causes the majority of the world's deaths.  It is what we call the "pandemic strain." HIV-2 is less pathogenic and has a distinct geographic focus in West Africa.  The HIVs and the various SIVs belong to a larger group of viruses that infect a wide range of mammals known as the lentiviruses (lenti- meaning slow, referring to the slow time course of the pathology typically caused by these viruses). Collectively, we call the SIVs and HIVs "primate lentiviruses."  Both HIV-1 and HIV-2 have well-documented origins in nonhuman primate reservoirs.  HIV-2 is most closely related to SIVsmm, a virus that infects sooty mangebeys (a type of West-African monkey).  HIV-1, on the other hand, is most closely related to SIVcpz, the virus that infects central and east African chimpanzees.  We believe that both HIV-1 and HIV-2 entered humans hosts when hunters were contaminated with the blood of infected monkeys (HIV-2) or chimpanzees (HIV-1). Note that this means that our terminology for the primate lentiviruses is polyphyletic.  SIVsmm and HIV-2 are sister species, while SIVcpz and HIV-1 are sister species.  Yet we call all the viruses that infect nonhuman primates simian and all the viruses that infect humans human immunodeficiency viruses.  It seems to me the best way to fix this would be to call the viruses that infect humans SIVhum1 and SIVhum2.  Of course, that will never happen, but I do think that it's important to clarify the evolutionary history of these viruses.

The conventional wisdom regarding primate lentiviruses is that, with the exception of HIV, they are not pathogenic in their natural host.  The reasoning for why HIV causes the devastating pathology that characterizes AIDS goes that HIV-1 is a relatively new infection of humans, having just spilled over into the human population recently.  Pathogens that have recently crossed species boundaries are frequently highly pathogenic because neither the new host nor the pathogen has a history of coevolution with its new partner.  While it is a pernicious myth (that just won't seem to die) that pathogens necessarily evolve toward a benign state, it is true that they frequently evolve a more intermediate level of virulence from their initial spillover virulence.  There are a number of problems with the idea that HIV causes AIDS because it is poorly adapted to human physiology.

The first of these is that HIV-1 is not that recent an infection of humans.  Sure, we didn't notice it until 1983 but careful molecular evolutionary analysis by Bette Korber of the Santa Fe Institute and my collaborator Beatrice Hahn and her group at the University of Alabama Birmingham puts the most likely date for the emergence of HIV-1 in humans to be 1931.  That means that HIV-1 was being transmitted from human-to-human for over fifty years before it was ever noticed by western science. Fifty years, while certainly brief in evolutionary terms, is still long enough to lead to some reduction in virulence or host evolution.

The real nail in the coffin, however, is our new result.  Specifically, we show that SIVcpz causes AIDS-like pathology in the Gombe chimpanzees. This result is surprising because (1) given it's pathogenicity, one would expect someone to have noticed it before, and (2) chimpanzees infected in captivity do not show obvious AIDS-like illness. I have been collaborating with Anne Pusey, Mike Wilson and their colleagues at the University of Minnesota's Jane Goodall Institute Center for Primate Studies on the the analysis of the demography of the Gombe chimps for a number of years now. Anne and Mike have, in turn, been collaborating with Beatrice Hahn with her project on monitoring natural SIV infection in wild chimpanzees across Africa. Given my background in HIV epidemiology and statistics, it was only natural that we all join forces to look at the demographic implications of SIV infection among the Gombe chimps.  Jane Goodall famously started chimpanzee research at Gombe in 1960 and since 1964, researchers at Gombe have collected detailed demographic information, documenting all births, deaths, and migration events in the central community and eventually expanding to the peripheral ones in later years. As a result, we have an unmatched level of demographic detail (not to mention behavioral and ecological information) against which to assess the impact of SIV infection.  Using statistical methods known collectively as event-history analysis, we were able to show that the hazard ratio between SIV-infected and SIV-negative chimps is on the order of 10-16.  This essentially means that SIV+ chimps have mortality rates that are 10-16 times higher than uninfected chimps.  The analysis controls for the clear potentially confounding effects of age and sex on overall mortality. The reason why no one ever noticed this heightened mortality rate is really because no one has ever looked for it. Even when a mortality rate is 10 times higher for some segment of a population, when that segment is small and when mortality rates quite low (chimps who survive infancy can live in excess of 40 years) it can be hard to detect even a seemingly large difference.  This is why we do science: because things that seem obvious once we know they are there can be remarkably subtle when we don't know they're there.  Science gives us the framework and the tools for studying nature's subtleties.

This project was absurdly interdisciplinary.  The paper has 22 co-authors, each contributing his or her own particular analytical expertise or providing access to crucial data necessary for the larger narrative.  There are papers in the literature in which people are made co-authors for pretty thin contributions.  This paper has none of that.  It was an extremely complicated story to tell and it really required the collaboration of this large team. Such work is not easy to manage and it's not at all easy to do well.  I think that Beatrice should be commended for orchestrating all the various major contributions, keeping us in line and on schedule (more or less). It's really gratifying to see the excellent blog piece by Carl Zimmer in which he notes the virtues -- and the difficulty -- of combining various scientific styles in pursuit of an important question. The title of Carl's piece is "AIDS and the Virtues of Slow-Cooked Science." In addition, there is a nice companion piece in this week's Nature written by Robin Weiss and Jonathan Heeney.  They too note the strength of the interdisciplinary approach to this problem.

The paper isn't even officially published until tomorrow and it has already been covered on Carl Zimmer's blog for Discover Magazine, The New ScientistThe GuardianThe ScientistThe New York Times and MSNBC. Wow.  Weiss & Heeney note a number of questions that are raised by our analysis.  Specifically, they ask "why was the progression to AIDS-like illness not more apparent in chimpanzees in captivity?" My co-author Paul Sharp notes "We need to know much more about whether there are any genetic differences among the chimpanzees, or differences in co-infections with other viruses, bacteria or parasites, which influence whether or not SIV infection leads to illness or death. This presents a unique opportunity to compare and contrast the disease-causing mechanisms of two closely related viruses in two closely related hosts."  Then, of course, there are the conservation questions that this paper raises.  Chimpanzees in the wild have birth rates that are very nearly balanced out by their death rates.  This difference, called the intrinsic rate of increase, largely determines the probability of extinction of a small population.  When the rate of increase of a population is negative, it is certain to go extinct (assuming the rate remains negative).  However, even if the intrinsic rate of increase is greater than zero, the randomness that besets small populations still means that a population can go extinct.  So, because their average birth and death rates are so close, individual chimp populations are certainly in potential jeopardy of going extinct, and Gombe is no exception to this rule. Now we add to a population something that increases mortality rates 10-16 times.  This is bound to have negative consequences for the persistence of affected chimp populations.  This is a topic that we are exploring even as I write...