Tag Archives: Statistics

An Alternate Course Load for the Game of Life

In a recent editorial in the New York Times, Harvard economist and former chairman of the Council of Economic Advisers, N. Gregory Mankiw provides some answers to the question “what kind of foundation is needed to understand and be prepared for the modern economy?”  Presumably, what he means by “modern economy” is life after college.  Professor Mankiw suggests that students of all ages learn something about the following subjects: economics, statistics, finance, and psychology.  I read this with interest and doing so made me think of my own list, which is rather different than the one offered by Mankiw. I will take up the instrumental challenge, making a list of subjects that I think will be useful in an instrumental sense — i.e., in helping graduates become successful in the world of the twenty-first century. In no way do I mean to suggest that students can not be successful if they don’t follow this plan for, like Mankiw, I agree that students should ignore advice as they see fit. Education is about discovery as much as anything and there is much to one’s education that transcends instrumentality — going to college is not simply about preparing people to enter “the modern economy,” even if it is a necessary predicate for success in it.

People should probably know something about economics.  However, I’m not convinced that what most undergraduate students are taught in their introductory economics classes is the most useful thing to learn. Contemporary economics is taught as an axiomatic discipline.  That is, a few foundational axioms (i.e., a set of primitive assumptions that are not proved but considered self-evident and necessary) are presented and from these, theorems can be derived.  Theorems can then be logically proven by recourse to axioms or other already-proven theorems. Note that this is not about explaining the world around us.  It is really an exercise in rigorously defining normative rules for how people should behave and what the consequences of such behavior would be, even if actual people don’t follow such prescriptions. Professor Mankiw has written a widely used textbook in Introductory Economics. In the first chapter of this book, we see this axiomatic approach on full display.  We are told not unreasonable things like “People Face Trade-Offs” or “The Cost of Something is What You Give Up to Get It” or “Rational People Think at the Margin.” I couldn’t agree more with the idea that people face trade-offs, but I nonetheless think there are an awful lot of problematic aspects to these axioms.  Consider the following paragraph (p. 5)

Another trade-off society faces is between efficiency and equality. Efficiency means that society is getting the maximum benefits from its scarce resources. Equality means that those benefits are distributed uniformly among society’s members. In other words, efficiency refers to the size of the economic pie, and equality refers to how the pie is divided into individual slices.

Terms like “efficiency” and “maximum benefits” are presented as unproblematic, as is the idea that there is a necessary trade-off between efficiency and equality.  Because it is an axiom, apparently contemporary economic theory allows no possibility for equality in efficient systems. Inequality is naturalized and thereby legitimized. It seems to me that this should be an empirical question, not an axiom. In his recent book, The Bounds of Reason: Game Theory and the Unification of the Behavioral Sciences, Herb Gintis provides a very interesting discussion of the differences between two highly formalized (i.e., mathematical) disciplines, physics and economics.  Gintis notes, “By contrast [to the graduate text in quantum mechanics], the microeconomics text, despite its beauty, did not contain a single fact in the whole thousand page volume. Rather, the authors build economic theory in axiomatic fashion, making assumptions on the basis of their intuitive plausibility, their incorporation of the ‘stylized facts’ of everyday life, or their appeal to the principles of rational thought.”

If one is going to learn economics, “the study of how society manages its scarce resources” — and I do believe people should — I think one should (1) learn about how  resources are actually managed by real people and real institutions and (2) learn some theory that focuses on strategic interaction.  A strategic interaction occurs when the best choice a person can make depends upon what others are doing (and vice-versa). The formal analysis of strategic interactions is done with game theory, a field typically taught in economics classes but also found in political science, biology, and, yes, even anthropology. Alas, this is generally considered an advanced topic, so you’ll have to go through all the axiomatic nonsense to get to the really interesting stuff.

OK, that was a bit longer than I anticipated. Whew.  On to the other things to learn…

Learn something about sociology. Everyone could benefit by understanding how social structures, power relations, and human stocks and flows shape the socially possible. Understanding that social structure and power asymmetries constrain (or enable) what we can do and even what we think is powerful and lets us ask important questions not only about our society but of those of the people with whom we sign international treaties, or engage in trade, or wage war. Some of the critical questions that sociology helps us ask include: who benefits by making inequality axiomatic? Does the best qualified person always get the job? Is teen pregnancy necessarily irrational? Do your economic prospects depend on how many people were born the same year as you were? How does taste reflect on one’s position in society?

People should definitely learn some statistics. Here, Professor Mankiw and I are in complete agreement.

Learn about people other than those just like you. The fact that we live in an increasingly global world is rapidly becoming the trite fodder of welcome-to-college speeches by presidents, deans, and other dignitaries. Of course, just because it’s trite doesn’t make it any less true, and despite the best efforts of homogenizing American popular and consumer culture, not everyone thinks or speaks like us or has the same customs or same religion or system of laws or healing or politics. I know; it’s strange. One might learn about other people in an anthropology class, say, but there are certainly other options. If anthropology is the chosen route, I would recommend that one choose carefully, making certain that the readings for any candidate anthropology class be made up of ethnographies and not books on continental philosophy. Come to grips with some of the spectacular diversity that characterizes our species. You will be better prepared to live in the world of the twenty-first century.

Take a biology class. If the twentieth century was the century of physics, the twenty-first century is going to be the century of biology.  We have already witnessed a revolution in molecular biology that began around the middle of the twentieth century and continued to accelerate throughout its last decades and into the twenty-first. Genetics is creeping into lots of things our parents would not have even imagined: criminology, law, ethics. Our decisions about our own health and that of our loved ones’ will increasingly be informed by molecular genetic information. People should probably know a thing or two about DNA. I shudder at popular representations of forensic science and worry about a society that believes what it sees on CSI somehow represents reality. I happen to think that when one takes biology, one should also learn something about organisms, but this isn’t always an option if one is going to also learn about DNA.

Finally, learn to write.  Talk about comparative advantage! I am continually blown away by poor preparation that even elite students receive in written English. If you can express ideas in writing clearly and engagingly, you have a skill that will carry you far. Write as much as you possibly can.  Learn to edit. I think editing is half the problem with elite students — they write things at the last minute and expect them to be brilliant.  Doesn’t work that way. Writing is hard work and well written texts are always well edited.

The Igon Value Problem

Priceless. Steve Pinker wrote a spectacular review of Malcolm Gladwell’s latest book, What the Dog Saw and Other Adventures, in the New York Times today. I regularly read and enjoy Gladwell’s essays in the New Yorker, but I find his style sometimes problematic, verging on anti-intellectual, and I’m thrilled to see a scientist of Pinker’s stature calling him out.

Pinker coins a term for the problem with Gladwell’s latest book and his work more generally.  Pinker’s term: “The Igon Value Problem” is a clever play on the Eigenvalue Problem in mathematics.  You see, Gladwell apparently quotes someone referring to an “igon value.” This is clearly a concept he never dealt with himself even though it is a ubiquitous tool in the statistics and decision science about which Gladwell is frequently so critical.  According to Pinker, the Igon Value Problem occurs “when a writer’s education on a topic consists in interviewing an expert,” leading him or her to offering “generalizations that are banal, obtuse or flat wrong.”  In other words, the Igon Value Problem is one of dilettantism.  Now, this is clearly a constant concern for any science writer, who has the unenviable task of rendering extremely complex and frequently quite technical information down to something that is simultaneously accurate, understandable, and interesting. However, when the bread and butter of one’s work involves criticizing scientific orthodoxy, it seems like one needs to be extremely vigilant to get the scientific orthodoxy right.

Pinker raises the extremely important point that the decisions we make using the formal tools of decision science (and cognate fields) represent solutions to the inevitable trade-offs between information and cost.  This cost can take the form of financial cost, time spent on the problem, or computational resources, to name a few. Pinker writes:

Improving the ability of your detection technology to discriminate signals from noise is always a good thing, because it lowers the chance you’ll mistake a target for a distractor or vice versa. But given the technology you have, there is an optimal threshold for a decision, which depends on the relative costs of missing a target and issuing a false alarm. By failing to identify this trade-off, Gladwell bamboozles his readers with pseudoparadoxes about the limitations of pictures and the downside of precise information.

Pinker is particularly critical of an analogy Gladwell draws in one of his essays between predicting the success of future teachers and future professional quarterbacks.  Both are difficult decision tasks fraught with uncertainty.  Predicting whether an individual will be a quality teacher based on his or her performance on standardized tests or the presence or absence of teaching credentials is an imperfect process just as predicting the success of a quarterback in the N.F.L. based on his performance at the collegiate level.  Gladwell argues that anyone with a college degree should be allowed to teach and that the determination of the qualification for the job beyond the college degree should only be made after they have taught. This solution, he argues, is better than the standard practice of  credentialing, evaluating, and “going back and looking for better predictors.” You know, science? Pinker doesn’t hold back in his evaluation of this logic:

But this “solution” misses the whole point of assessment, which is not clairvoyance but cost-effectiveness. To hire teachers indiscriminately and judge them on the job is an example of “going back and looking for better predictors”: the first year of a career is being used to predict the remainder. It’s simply the predictor that’s most expensive (in dollars and poorly taught students) along the accuracy- cost trade-off. Nor does the absurdity of this solution for professional athletics (should every college quarterback play in the N.F.L.?) give Gladwell doubts about his misleading analogy between hiring teachers (where the goal is to weed out the bottom 15 percent) and drafting quarterbacks (where the goal is to discover the sliver of a percentage point at the top).

This evaluation is spot-on. As a bit of an aside, the discussion of predicting the quality of prospective quarterbacks also reminds me of one of the great masterpieces of statistical science and the approach described by this paper certainly has a bearing on the types of predictive problems of which Gladwell ruminates.  In a 1975 paper, Brad Efron and Carl Morris present a method for predicting 18 major league baseball players’ 1970 season batting average based on their first 45 at-bats. The naïve method for predicting (no doubt, the approach Gladwell’s straw “we” would take) is simply to use the average after the first 45 at-bats. Turns out, there is a better way to solve the problem, in the sense that you can make more precise predictions (though hardly clairvoyant).  The method turns on what a Bayesian would call “exchangeability.”  Basically, the idea is that being a major league baseball player buys you a certain base prediction for the batting average.  So if we combine the averages across the 18 players and with each individual’s average in a weighted manner, we can make a prediction that has less variation in it.  A player’s average after a small number of at-bats is a reflection of his abilities but also lots of forces that are out of his control — i.e., are due to chance.  Thus, the uncertainty we have in a player’s batting based on this small record is partly due to the inherent variability in his performance but also due to sampling error.  By pooling across players, we combine strength and remove some of this sampling error, allowing us to make more precise predictions. This approach is lucidly discussed in great detail in my colleague Simon Jackman‘s new book, draft chapters of which we used when we taught our course on Bayesian statistical methods for the social sciences.

Teacher training and credentialing can be thought of as strategies for ensuring exchangability in teachers, aiding the prediction of teacher performance.  I am not an expert, but it seems like we have a long way to go before we can make good predictions about who will become an effective teacher and who will not.  This doesn’t mean that we should stop trying.

Janet Maslin, in her review of What the Dog Saw, waxes about Gladwell’s scientific approach to his essays. She writes that the dispassionate tone of his essays “tames visceral events by approaching them scientifically.” I fear that this sentiment, like the statements made in so many Gladwell works, reflects the great gulf between most educated Americans and the realities of scientific practice (we won’t even talk about the gulf between less educated Americans and science).  Science is actually a passionate, messy endeavor and sometimes we really do get better by going back and finding better predictors.

Predicting Catastrophe?

There is an extremely cool paper in this week’s Nature by Scheffer and colleagues. I’m too busy right now to write much about it, but I wanted to mention it, even if only briefly.  The thing that I find so remarkable about this paper is that it’s really not the sort of thing that I usually like.  The paper essentially argues that there are certain generic features of many systems as they move toward catastrophic change.  The paper discusses epileptic seizures, asthma attacks, market collapses, abrupt shifts in oceanic circulation and climate, and ecological catastrophes such as sudden shifts in rangelands, or crashes of fish or wildlife populations. At first, it sounds like the vaguely mystical ideas about transcendent complexity, financial physics, etc.  But really, there are a number of very sensible observations about dynamical systems and a convincing argument that these features will be commonly seen in real complex systems.

The basic idea is that there are a number of harbingers of catastrophic change in time series of certain complex systems.  The model the authors use is the fold catastrophe model, where there is an attractor that folds back on itself like a sideways “N”.  As one gets close to a catastrophic bifurcation, a very straightforward analysis shows that the rate of return to the attractor decreases (I have some notes that describe the stability of the equilibria of simple population models here. The tools discussed in Scheffer et al. (2009) are really just generalizations of these methods).  As the authors note, one rarely has the luxury of measuring rates of return to equilibria in real systems but, fortunately, there are relatively easily measured consequences of this slow-down of rates of return to the attractor. They show in what I think is an especially lucid manner how the correlations between consecutive observations in a time series will increase as one approaches one of these catastrophic bifurcation points. This increased correlation has the effect of increasing the variance.

So, two ways to diagnose an impending catastrophe in a system that is characterized by the fold bifurcation model are: (1) an increase in variance of the observations in the series and (2) an increase in the lag-1 autocorrelation.  A third feature of impending catastrophes does not have quite as intuitive an explanation (at least for me), but is also relatively straightforward.  Dynamical systems approaching a catastrophic bifurcation will exhibit increased skewness to the fluctuations as well as flickering.  The skewness means that the distribution of period-to-period fluctuations will become increasingly asymmetric.  This has to do with the shape of the underlying attractor and how the values of the system are forced across it. Flickering means that the values will bounce back and forth between two different regimes (say, high and low) rapidly for a period before the catastrophe.  This happens when the system is being forced with sufficient strength that it is bounced between two basins of attraction before getting sucked into a new one for good (or at least a long time).

In summary, there are four generic indicators of impending catastrophe in the fold bifurcation model:

  1. Increased variance in the series
  2. Increased autocorrelation
  3. Increased skewness in the distribution of fluctuations
  4. Flickering between two states

There are all sorts of worrisome implications in these types of models for climate change, production systems, disease ecology, and the dynamics of endangered species.  What I hope is that by really getting a handle on these generic systems, we will develop tools that will help us identify catastrophes soon enough that we might actually be able to do something about some of them.  The real challenge, of course, is developing tools that give us the political will to tackle serious problems subject to structural uncertainty. I won’t hold my breath…

Plotting Error Bars in R

One common frustration that I have heard expressed about R is that there is no automatic way to plot error bars (whiskers really) on bar plots.  I just encountered this issue revising a paper for submission and figured I'd share my code.  The following simple function will plot reasonable error bars on a bar plot.

R:
  1. error.bar <- function(x, y, upper, lower=upper, length=0.1,...){
  2. if(length(x) != length(y) | length(y) !=length(lower) | length(lower) != length(upper))
  3. stop("vectors must be same length")
  4. arrows(x,y+upper, x, y-lower, angle=90, code=3, length=length, ...)
  5. }

Now let's use it.  First, I'll create 5 means drawn from a Gaussian random variable with unit mean and variance.  I want to point out another mild annoyance with the way that R handles bar plots, and how to fix it.  By default, barplot() suppresses the X-axis.  Not sure why.  If you want the axis to show up with the same line style as the Y-axis, include the argument axis.lty=1, as below. By creating an object to hold your bar plot, you capture the midpoints of the bars along the abscissa that can later be used to plot the error bars.

R:
  1. y <- rnorm(500, mean=1)
  2. y <- matrix(y,100,5)
  3. y.means <- apply(y,2,mean)
  4. y.sd <- apply(y,2,sd)
  5. barx <- barplot(y.means, names.arg=1:5,ylim=c(0,1.5), col="blue", axis.lty=1, xlab="Replicates", ylab="Value (arbitrary units)")
  6. error.bar(barx,y.means, 1.96*y.sd/10)

error-bars
Now let's say we want to create the very common plot in reporting the results of scientific experiments: adjacent bars representing the treatment and the control with 95% confidence intervals on the estimates of the means.  The trick here is to create a 2 x n matrix of your bar values, where each row holds the values to be compared (e.g., treatment vs. control, male vs. female, etc.). Let's look at our same Gaussian means but now compare them to a Gaussian r.v. with mean 1.1 and unit variance.

R:
  1. y1 <- rnorm(500, mean=1.1)
  2. y1 <- matrix(y1,100,5)
  3. y1.means <- apply(y1,2,mean)
  4. y1.sd <- apply(y1,2,sd)
  5.  
  6. yy <- matrix(c(y.means,y1.means),2,5,byrow=TRUE)
  7. ee <- matrix(c(y.sd,y1.sd),2,5,byrow=TRUE)*1.96/10
  8. barx <- barplot(yy, beside=TRUE,col=c("blue","magenta"), ylim=c(0,1.5), names.arg=1:5, axis.lty=1, xlab="Replicates", ylab="Value (arbitrary units)")
  9. error.bar(barx,yy,ee)

means-comparison

Clearly, a sample size of 100 is too small to show that the means are significantly different. The effect size is very small for the variability in these r.v.'s.  Try 10000.

R:
  1. y <- rnorm(50000, mean=1)
  2. y <- matrix(y,10000,5)
  3. y.means <- apply(y,2,mean)
  4. y.sd <- apply(y,2,sd)
  5. y1 <- rnorm(50000, mean=1.1)
  6. y1 <- matrix(y1,10000,5)
  7. y1.means <- apply(y1,2,mean)
  8. y1.sd <- apply(y1,2,sd)
  9. yy <- matrix(c(y.means,y1.means),2,5,byrow=TRUE)
  10. ee <- matrix(c(y.sd,y1.sd),2,5,byrow=TRUE)*1.96/sqrt(10000)
  11. barx <- barplot(yy, beside=TRUE,col=c("blue","magenta"), ylim=c(0,1.5), names.arg=1:5, axis.lty=1, xlab="Replicates", ylab="Value (arbitrary units)")
  12. error.bar(barx,yy,ee)

means-comparison1

That works. Maybe I'll show some code for doing power calculations next time...

Follow-Up to the Reversal in Fertility Decline

In my last post, I wrote about a new paper by Myrskylä and colleagues in this past week's issue of Nature.  Craig Hadley sent me a link to a criticism of this paper, and really more the science reporting of it in the Economist, written by Edward Hugh on the blog A Fist Full of Eruos within a couple hours of my writing.  Hugh levels three criticisms against the Myrskylä et al. (2009) paper:

  1. The authors use total fertility rate (TFR) as their measure of fertility, even though TFR has known defects.
  2. The reference year (2005) was a peculiar year and so results based on comparisons of other years to it are suspect.
  3. Even if fertility increases below its nadir in highly developed countries, median age of the population could increase.

The first two of these are criticisms of the Myrskylä et al. (2009) Nature paper and it is these that I will address here. The third is really a criticism of the Economist's coverage of the paper.

TFR is a measure of fertility and in demographic studies like these, what we care about is people's fertility behavior.  In a seminal (1998) paper, John Bongaarts and Griffith Feeney pointed out that as a measure of fertility TFR actually confounds two distinct phenomena: (1) the quantum of reproduction (i.e., how many babies) and (2) the tempo of reproduction (i.e., when women have them).  Say we have two populations: A and B.  In both populations, women have the same number of children on average. However, in population B, women delay their reproduction until later ages perhaps by getting married at older ages.  In both populations, women have the same number of offspring but we would find that population A had the higher TFR. How is that possible? It is a result of the classic period-cohort problem in demography.   As social scientists, demographers care about what actual people actually do. The problem is that measuring what actual people actually do over their entire lifetimes introduces some onerous data burdens and when you actually manage to get data for individual lifetimes, it is typically horribly out-of-date. For example, if you want to look at completed fertility, you need to look at women who are 50 years old or older at the time.  This means that most of their childbearing happened between 20 and 30 years ago. Not necessarily that informative about current trends in fertility.

To overcome this problem, demographers frequently employ period measures of fertility, mortality, marriage, migration, etc.  A period measure is essentially a cross-sectional measure of the population taken at a particular point in time.  Rather than measuring the fertility of women throughout their lifetimes (i.e., looking at the fertility of a cohort of women where they are age 20, 30, 40, etc.), we measure the fertility of 20 year-olds, 30 year-olds, 40 year-olds, and so on at one particular point in time. We then deploy one of those demographers' fictions.  We say that our cross-section of ages is a reflection of how people act over their life course.  TFR is a period measure.  We take the fertility rates measured for women ages 15-50 at a particular point in time (say, 2005) and sum them to yield the number of children ever born to a woman surviving to the end of her reproductive span if she reproduced at the average rate of the aggregate population.

Here is a simple (highly artificial) example of how this works.  (Demographic purists will have to forgive me for reversing the axes of a Lexis diagram, as I think that having period along the rows of the table is more intuitive to the average person for this example.)  The cells contain annual age specific fertility rates for each period. We calculate the period TFR by multiplying these values by the number of years in the age-class (which I assume is 5 for classes 10 and 40 and 10 for the others).  In 1940, we see the beginning of trend in delayed fertility -- no women 15-20 (i.e., the "10 year-old" age class) have children.  This foregone early fertility is made up for by greater fertility of 20-30 year-olds in 1940.  Eventually, overall fertility declines -- at least in the periods for which we have full observations since the 1950, 1960, and 1970 cohorts have not completed their childbearing when the observations stop.

TFR-tempo-example

When we measure the TFR in 1930, we see that it is higher than the TFR in 1940 (3 vs. 2.5).  Nonetheless, when we follow the two cohorts through to the end of their childbearing years (in blue for 1930 and red for 1940), we see that they eventually have the same cohort TFRs. That is, women in both cohorts have the same total number of children on average; it's just that the women in 1940 begin childbearing later.  The behavior change is in tempo and not quantum and the period measure of fertility -- which is ostensibly a quantum measure since it is the total number of children born to a woman who survives to the end of her childbearing years -- is consequently distorted.

Bongaarts and Feeney (1998) introduced a correction to TFR that uses measures of birth order to remove the distortions.  Myrskylä et al. (2009) were able to apply the Bongaarts/Feeney correction to a sub-sample (41) of their 2005 data.  Of these 41 countries, they were able to calculate the tempo-adjusted TFR for 28 of the 37 countries with an HDI of 0.85 or greater in 2005. The countries with adjusted TFRs are plotted in black in their online supplement figure S2, reproduced here with permission.

Myrskyla_etal-figS2As one can easily see, the general trend of increasing TFR with HDI remains when the corrected TFRs are used.  This graphical result is confirmed by a formal statistical test: Following the coincident TFR minimum/HDI in the 0.86-0.9 window, the slope of the best-fit line through the scatter is positive.

Hugh notes repeatedly that Myrskylä et al. (2009) anticipated various criticisms that he levels.  For example, he writes "And you don’t have to rely on me for the suggestion that the Tfr is hardly the most desireable [sic] measure for what they want to do, since the authors themselves point this very fact out in the supplementary information." This seems like good honest social science research to me. I'm not entirely comfortable with the following paraphrasing, but here it goes.  We do science with the data we have, not the data we wish we had.  TFR is a widely available measure of fertility that allowed the authors to look at the relationship between fertility and human development over a large range of the HDI. Now, of course, having written a paper with the data that are available, we should endeavor to collect the data that we would ideally want.  The problem with demographic research though is that we are typically at the whim of the government and non-government (like the UN) organizations that collect official statistics.  It's not like we can go out and perform a controlled experiment with fixed treatments of human development and observe the resulting fertility patterns. So this paper seems like a good-faith attempt to uncover a pattern between human development and fertility.  When Hugh writes "the only thing which surprises me is that nobody else who has reviewed the research seems to have twigged the implications of this" (i.e., the use of  TFR as a measure of fertility), I think he is being rather unfair.  I don't know who reviewed this paper, but I'm certain that they had both a draft of the paper that eventually appeared in the print edition of Nature and the online Supplemental material in which Myrskylä and colleagues discuss the potential weaknesses of their measures and evaluate the robustness of their conclusions. That's what happens when you submit a paper and it undergoes peer review.  The pages of Nature are highly over-subscribed (as Nature is happy to tell you whenever it sends you a rejection letter).  Space is at a premium and the type of careful sensitivity analysis that would be de rigeur in the main text of a specialist journal  such as Demography, Population Studies, or Demographic Research, end up in the online supplement in Nature, Science, or PNAS.

On a related note, Hugh complains that the reference year in which the curvilinear relationship between TFR and HDI is shown is a bad year to pick:

Also, it should be remembered, as I mention, we need to think about base years. 2005 was the mid point of a massive and unsustainable asset and construction boom. I think there is little doubt that if we took 2010 or 2011, the results would be rather different.

The problem with this is that the year is currently 2009, so we can't use data from 2010 or 2011.  It seems entirely possible that the results would be different if we used 2011 data and I look forward to the paper in 2015 in which the Myrskylä hypothesis is re-evaluated using the latest demographic data.  This is sort of the nature of social science research.  There are very few Eureka! moments in social science.  As I note above, we can't typically do the critical experiment that allows us to test a scientific hypothesis.  Sometimes we can get clever with historical accidents (known in the biz as "natural experiments"). Sometimes we can use fancy statistical methods to approximate experimental control (such as the fixed effects estimation Myrskylä et al. use or the propensity score stratification used by Felton Earls and colleagues in their study of firearm exposure and violent behavior).  If we waited until we had the perfect data to test a social science hypothesis, there would never be any social science.  Perhaps things will indeed be different in 2011.  If so, we may even get lucky and by comparing why things were different in 2005 and 2011, gain new insight into the relationships between human development and fertility. Until then, I am going to credit Myrskylä and colleagues for opening a new chapter on our understanding of fertility transitions.

Oh, and I plan to cite the paper, as I'm sure many other demographers will too...

Why Use R?

An anthropologist colleague who did a post-doc in a population center has been trying to get a group of people at his university together to think about population issues.  This is something I'm all for and am happy to help in whatever little way I can to facilitate especially anthropologists developing their expertise in demography.  One of the activities they have planned for this population interest group is a workshop on the R statistical programming language. The other day he wrote me with the following very reasonable question that has been put to him by several of the people in his group: Sure R is free but other than that why should someone bother to learn new software when there is perfectly acceptable commercial software out there?  This question is particularly relevant when one works for an institution like a university where there are typically site licenses and other mechanisms for subsidizing the expense of commercial software (which can be substantial).  What follows is, more or less, what I said to him.

I should start out by saying that there is a lot to be said for free. I pay several hundred dollars a year for commercial software that I don't actually use that often. Now, when I need it, it's certainly nice to know it's there but if I didn't have a research account paying for this software, I might let at least one or two of these licenses slide.  I very occasionally use Stata because the R package that does generalized linear mixed models has had a bug in the routine that fits logistic mixed models and this is something that Stata does quite well. So I regularly get mailings about updates and I am always just blown away at the expense involved in maintaining the most current version of this software, particularly when you used the intercooled version.  It's relatively frequently updated (a good thing) but these updates are expensive (a bad thing for people without generous institutional subsidies). So, let me just start by saying that free is good.

This actually brings up a bit of a pet peeve of mine regarding training in US population centers.  We have these generous programs to train population scientists and policy-makers from the poor countries of the world.  We bring them into our American universities and train them in demographic and statistical methods on machines run by proprietary (and expensive!) operating systems and using extremely expensive proprietary software.  These future leaders will graduate and go back home to Africa, Asia, eastern Europe, or Latin America. There, they probably won't have access to computers with the latest hardware running the most recent software.  Most of their institutions can't afford expensive site licenses to the software that was on every lab machine back at Princeton or UCLA or Michigan or [fill in your school's name here]. This makes it all the more challenging to do the work that they were trained to do and leaves them just that much more behind scholars in advanced industrial nations.  If our population centers had labs with computers running Linux, taught statistics and numerical methods using R, and had students write LaTeX papers, lecture slides, and meeting posters using, say, Emacs rather than some bloated word-processor whose menu structure seems to change every release, then I think we would be doing a real service to the future population leaders of the developing world. But let's return to the question at hand, other than the fact that it's free -- which isn't such an issue for someone with a funded lab at an American University -- why should anyone take the trouble to learn R? I can think of seven reasons off the top of my head.

(1) R is what is used by the majority of academic statisticians.  This is where new developments are going to be implemented and, perhaps more importantly, when you seek help from a statistician or collaborate with one, you are in a much better position to benefit from the interaction if you share a common language.

(2) R is effectively platform independent.  If you live in an all-windows environment, this may not be such a big deal but for those of us who use Linux/Mac and work with people who use windows, it's a tremendous advantage.

(3) R has unrivaled help resources.  There is absolutely nothing like it.  First, the single best statistics book ever is written for R (Venables & Ripley, Modern Applied Statistics in S -- remember R is a dialect of S).  Second, there are all the many online help resources both from r-project.org and from many specific lists and interest groups. Third, there are proliferating publications of excellent quality. For example, there is the new Use R series. The quantity and quality of help resources is not even close to matched by any other statistics application.  Part of the nature of R -- community constructed, free software -- means that the developers and power users are going to be more willing to provide help through lists, etc. than someone in a commercial software company. The quality and quantity of help for R is particularly relevant when one is trying to teach oneself a new technique of statistical method.

(4) R makes the best graphics. Full stop. I use R, Matlab, and Mathematica.  The latter two applications have a well-deserved reputation for making great graphics, but I think that R is best.  I quite regularly will do a calculation in Matlab and export the results to R to make the figure.  The level of fine control, the intuitiveness of the command syntax (cf. Matlab!), and the general quality of drivers, etc. make R the hands-down best.  And let's face it, cool graphics sell papers to reviewers, editors, etc.

(5) The command-line interface -- perhaps counterintuitively -- is much, much better for teaching.  You can post your code exactly and students can reproduce your work exactly.  Learning then comes from tinkering. Now, both Stata and SAS allow for doing everything from the command line with scripts like do-files.  But how many people really do that?  And SPSS...

(6) R is more than a statistics application.  It is a full programming language. It is designed to seamlessly incorporate compiled code (like C or Fortran) which gives you all the benefits of a interactive language while allowing you to capitalize on the speed of compiled code.

(7) The online distribution system beats anything out there.

Oh, and let's face it, all the cool kids use it...

New York Times Discovers R

A recent article in the New York Times extolls the virtues of the R statistical programming language.  Better late than never, I suppose.  I first discovered R in 1999, just as I began writing my dissertation. At the time, I used Matlab for all my computational needs.  I still occasionally use Matlab when doing hardcore matrix algebra or numerically solving differential equations.  I also sometimes use Mathematica to check my algebra or to solve equations when I'm feeling lazy (I think there are actually lots more possibilities but exploring these hasn't been a priority), but mostly I now use R. When looking for a post-doc, one of my training goals was learning R. I certainly scored in that department by landing in the Center for Statistics in the Social Sciences at the University of Washington working with Mark Handcock, sharing an office with Steve Goodreau, and interacting with people life Adrian Raftery, Peter Hoff, and Kevin Quinn, I learned a lot about R. I'd like to think that I saw the writing on the wall.  Mostly though, I think I liked the idea of free, open-source, state-of-the-art numerical software.

I use R in many of the classes I teach, including Demography and Life History Theory, Applied Bayesian Methods in the Social Sciences, Data Analysis in the Anthropological Sciences, and our NICHD-funded Workshop in Formal Demography. While I don't expect the students to learn it, I also use R to make most of the figures I show in slides in other classes like Evolutionary Theory, Environmental Change and Emerging Infectious Disease, and even The Evolution of Human Diet. My colleague Ian Robertson also teaches his quantitative classes in R.  Anthropology is also very lucky to have an academic technology specialist, Claudia Engel, with a strong interest in supporting both faculty and student use of R. The Human Spatial Dynamics Laboratory has a growing list of R resources for student (and other's) use.  My lab site will soon host all of our R material for the summer workshops as well as my R package demogR.

I sometimes wonder if other anthropologists are learning R.  I'm sure Steve's students get some R up at UW.  But is there anyone else out there?  Perhaps this is one of the great comparative advantages we can give our students here at Stanford.  Since the New York Times says it's cool, it must be true.

Statistics and Election Forecasting

With election day past us now, I have a moment to reflect upon how uncanny were Nate Silver and crew's predictions of the election.  I became quite a FiveThirtyEight.com junky as the election approached and I think that the stunning success that they demonstrated in predicting all sorts of elections yesterday holds lessons for the way we do social science more generally.

The predictions at  FiveThirtyEight.com start with the basic premise that all polls are wrong but when taken in aggregate, they provide a great deal of very useful information.  Basically, they aggregated information from a large number of polls and weighted the contributions of the different polls based on their reliability scores.  These reliability scores are based on three things: (1) the pollster's accuracy in predicting recent election outcomes, (2) the poll's sample size, and (3) the recentness of the poll.  Pollsters who have done well in the past, typically do well in the present.  Polls of many potential voters are more precise than polls of a small number of voters.  Recent polls are more salient that polls taken a while ago.  The site provides a very detailed account of how it calculates its reliability scores, particularly in terms of pollster accuracy.  The weighted polling data were then further adjusted for trends in polls. For their projections, they then took the adjusted polling data and ran regressions on a variety of social and demographic variables for the different polled populations. Using these regressions, they were able to calculate a snapshot in time of how each state would likely vote if the vote were held on that day.  These snapshots were then projected to the November election.  Finally, they ran simulations over their projections (10,000 at a time) to understand how the various forms of uncertainty were likely to affect the outcomes.

FiveThirtyEight.com projected that Obama would win with 348.6 electoral votes.  The current count is (provisionally) 364.  Pretty darn good, given the manifold uncertainties.  What is even more stunning is a comparison of the projected/realized election maps.  Here is the actual electoral map (as of today, 5 November at 21:00 PST):

Electoral Map on 6 November 2008

 

Here is their final proejction:

Final fivethirtyeight.com Projection

Hard to imagine it being much righter...

In the social sciences - especially in anthropology -- we gather crappy data all the time. There is generally much more that we can do with these data than is usually done.  I find that the fivethirtyeight.com methodology has a lot to offer social science.  In particular, I really like the focus on prediction of observable quantities. Too often, we get caught up in the cult of the p-value and focused on things that ultimately unknowable and unmeasurable (the "true" value of a test statistic, for example).  Predicting measurables and then adjusting the weight one gives particular predictions based on their past performance seems like a very reasonable tool for other types of social (and natural) science applications. 

I need to think more about specific anthropological applications, but I am intrigued at least by the idea that one could use the clearly biased results of some assay of behavior to nonetheless make accurate predictions of some outcome of interest. In the case of elections, the assay is polling.  In the anthropological case, it might be the report of an informant in one's ethnographic investigation.  We know that informants (like pollsters, or ethnographers for that matter) may have a particular agenda. But if we could compare the predictions based on an ethnographic interview with a measurable outcome, adjust the predictions based on predictive performance and then aggregate the predictions of many informants, we might have a powerful, scientific approach to some ethnographic questions that acknowledges the inherent bias and subjectivity of the subject matter but nonetheless makes meaningful scientific predictions. 

I'm just thinking out loud here.  Clearly, I need to add some more specifics to have this make sense. Perhaps I will take up this thread again in the future.  For now, I just want to pass along kudos once more to FiveThirtyEight.com for a job very well done.

Truly Excellent Statistical Graphic

The figure that appeared on MediaCurves.com (the link to which I found here) following the second presidential debate last night was a truly outstanding example of communicating complex information using simple, effective graphical presentation.

The figure shows the responses of 1004 respondents to the question of who won the debate.  The graphic summarizes quite a bit of information in a readily understandable manner.  What I find particularly striking is (1) 20% of self-reported Republicans think that Barack Obama won and (2) only 68% of self-reported Republicans think that John McCain won.

Not necessarily related to statistical graphics, it will be interesting to see if Nate Silver is as good at predicting presidential elections as he is at predicting baseball outcomes.

Arrgh...

I never did get around to writing about International Talk Like a Pirate Day yesterday.  Carl Boe, from Berkeley, and I have a long-running joke about pirate-speak stemming from our teaching computing for formal demography using that old swashbuckler standby software -- you guessed it -- R.  We wanted to reduce the anxiety generated in students who needed to simultaneously learn both the methods of formal demography and steep-learning-curve software by dressing -- and talking -- as pirates.  We never did do it, but there are always future workshops.  This year, Carl sent me the following amusing picture related to Talk Like a Pirate Day.

Pirate Keyboard