social structure | monkey's uncle

There is a very interesting piece in the New York Times today by David Leonhardt on the apparent backlash against prediction markets such as Intrade and Betfair. In principle, these markets make predictions by aggregating the disparate information of many independent bettors who offer prices for a particular outcome. Prediction markets have enjoyed a fair amount of success in recent elections. The University of Iowa has even set up an influenza prediction market. But prediction markets are hardly perfect and have had some pretty big recent failures. It turns out that Intrade failed in a pretty spectacular manner to predict the outcome of the recent Supreme Court ruling about the constitutionality of the Affordable Care Act. Leonhardt suggests that some of the failures of online prediction markets is attributable to relatively small number of people who actually trade on the market:

But the crowd was not everywhere wise. For one thing, many of the betting pools on Intrade and Betfair attract relatively few traders, in part because using them legally is cumbersome. (No, I do not know from experience.) The thinness of these markets can cause them to adjust too slowly to new information.

This may have been an issue with the ACA decision but the primary problem with the incorrect prediction is that the crowd doesn't actually know much about the workings of the very closed social network that is the United States Supreme Court. Writes Leonhardt:

And there is this: If the circle of people who possess information is small enough -- as with the selection of a vice president or pope or, arguably, a decision by the Supreme Court -- the crowds may not have much wisdom to impart. 'There is a class of markets that I think are basically pointless,' says Justin Wolfers, an economist whose research on prediction markets, much of it with Eric Zitzewitz of Dartmouth, has made him mostly a fan of them. 'There is no widely available public information.'

This point gets at a larger critique of market-based solutions to problems suggested by my Stanford colleague Mark Granovetter over 25 years ago (Granovetter 1985). This is the problem of embeddedness. The idea of embeddedness was anticipated by the work of substantivist economist Karl Polanyi, but Granovetter really laid out the details. Granovetter writes (1985: 487): "A fruitful analysis of human action requires us to avoid the atomization implicit in the theoretical extremes of under- and oversocialized conceptions [of human action]. Actors do not behave or decide as atoms outside a social context, nor do they adhere slavishly to a script written for them by the particular intersection of social categories that they happen to occupy. Their attempts at purposive action are instead embedded in concrete, ongoing systems of social relations." Atomization is independent bettors making decisions about the price they are willing to pay for a certain outcome.

The argument for embeddedness emerges in Granovetter's paper from the problem of trust in markets. Where does trust come from in competitive markets? The fundamental problem here regards the micro-foudnations of markets where "the alleged discipline of competitive markets cannot be called on to mitigate deceit, so the classical problem of how it can be that daily economic life is not riddled with mistrust and malfeasance has resurfaced." (p. 488). The obvious solution to this is that actors choose to deal with alters whom they trust and that the most effect way to develop trust is to have prior dealings with an alter.

Granovetter's embeddedness theory is a modest one. He notes that, unlike the alternative models, his "makes no sweeping (and thus unlikely) predictions of universal order or disorder but rather assumes that the details of social structure will determine which is found." (p. 493)

These ideas about the careful analysis of social structure and networks of interlocking relationships are fundamental for understanding when the crowd will be wise and when it will not. They are also essential for developing effective development interventions and, for that matter, making markets work for the public good in general. The theory of embeddedness allows for the possibility that markets can work but if we are to understand when they work and when they don't, we need to think about social structure as more than just a bit of friction in an ideal market and take its measurement more seriously. People are not ideal gases. (Dirty little secret: most gases are not ideal gases). This gets at some problems that I have been thinking about a lot recently relating to the implications of additive, observational noise vs. process noise and its implications for prediction of multi-species epidemics, but that must wait for another post...

Marcel Salathé and I have a brand new paper out in today's issue of the Public Library of Science, Computational Biology. There is also a news piece by Adam Gorlick in the Stanford Report this morning. This is an idea I've been bouncing around for a few years now and I was very fortunate to have Marcel – and his programming wizardry – show up with an interest in the very same topic just at the right time. It's not every day that one of the most talented young theoretical biologists in the world shows up at your office wanting to collaborate. If it ever happens to you, I suggest you act!

The fundamental question is: Does social structure affect that course of epidemics? The answer seems obvious, particularly for infectious diseases that are transmitted by direct person-to-person contact. However, specific work demonstrating the effects of social structure on epidemics can be hard to find. Part of the problem, of course, is that you can hardly do experiments in which you change social structure and then subject populations to an infectious disease. To overcome this ethical and practical barrier to research, epidemiologists, biologists, and social scientists interested in disease and human behavior use mathematical and computational models to study how changes in host behavior affect the outcome of simulated epidemics.

Two specific topics that clearly have some bearing on social structure have been investigated extensively: individual heterogeneity in contact number and individual assortativeness. Epidemic behavior in all but the simplest models has been seen as being driven by heterogeneity. When there is a lot of variance in the number of potentially infectious contacts that individuals in a population have, epidemics are more likely, they infect large segments of the population more quickly, and ultimately infect a larger fraction of the total population. Consider the extreme case where all members of a population have one contact except for one person, who has a contact with everyone else. If we were to draw a picture of such a contact network, it would resemble a star or a wheel with a central hub and spokes:

Infect any random individual on this star and everyone else is at risk for infection. At the opposite extreme, if everyone has exactly one contact, then a randomly infected person can infect, at most, one other individual.

Assortativeness, the tendency for individuals to associate with others like themselves, can either aid or hinder the spread of infections. People in contemporary nation states like the United States show an incredible capacity to form associations with like individuals. We form social relationships, particularly intimate relationships, with people who are similar to us in age, socioeconomic status, sexual orientation, ethnicity, education, religion, forms of deviance behavior such as drug use or criminal activity, etc. Frequently, this assortativeness has the effect of localizing and concentrating epidemiologically important contacts. When this happens, individuals who act as bridges between different communities take on central epidemiological importance. For example, married men who visit commercial sex workers can serve as a critical bridge connecting high-risk populations of sex workers and injection drug users with the general population. Similarly, health care workers can bridge hospital populations with the general population, a phenomenon important for the emergence of SARS in 2002. (Note that for epidemiological applications, we call such individuals “bridges” but in other applications we might call them “brokers” or “entrepreneurs,” highlighting the general importance of such ideas for understanding society.) The existence of such social bridges highlights the fact that people can also assort on characteristics that are not visible attributes and this type of assortative behavior can increase connectivity. In particular, if people with few contacts tend to be connected to people with many contacts (as in the case of the star), then such disassortativeness can increase the epidemic potential in a population.

The aggregate effects of individual behavioral decisions can have a profound effect on the shape and composition of human populations, but there is more to human populations than simply individual behavior. For one thing, human populations are characterized by a hierarchical structure: individuals typically belong to households and households are aggregated into communities, which are, in turn, aggregated in towns, states, nations, etc. Naturally, there are cross-cutting ties in such hierarchical organization (much like bridges in individual contact networks). Freudian fantasies of primitive hordes aside, even the largely egalitarian societies of hunter-gatherers are characterized by a hierarchical structuring of families, bands, and tribes. Hierarchical structuring is clearly important for understanding social process in human societies.

So what effect does such community structure have on epidemics? To address this question, Marcel and I combined the formalisms of social network analysis and computational models of epidemics. We already know that heterogeneity in contact number can have profound effects on the outcomes of epidemics and that such heterogeneity can change aggregate social structure in complex ways. To avoid such complications, we generated networks where every individual had the exact same number of contacts. The only thing that varied in these toy networks was the likelihood that any randomly chosen connection between two individuals would be either within or between more or less cohesive subgroups (a.k.a., “communities”). Using metrics derived from Graph Theory, the branch of discrete mathematics that provides the basic tools for Social Network Analysis, we were able to characterize the degree of community structure and relate this to the outcome of epidemics simulated on the resulting networks.

It turns out that community structure has an enormous effect on epidemic outcome. In particular, we found that there is a remarkably abrupt transition from small outbreaks to very large outbreaks as we moved from the most structured populations to more moderately structured ones. Populations characterized by extreme community structure have smaller outbreaks because the infection has a hard time getting out of a community before dying out. As more connections to other communities are made – i.e., the community structure is lessened – there are more opportunities for the infection to escape and affect a larger fraction of the total population. While the result sounds intuitively satisfying after the fact, there was little precedent for expecting such an outcome in the mathematical theory of epidemics. This is because none of the standard metrics of an infectious disease – the basic reproduction ratio, in particular – changed as the populations’ community structure changed.

When we investigated the further structural network correlates of epidemic size, we found that one measure in particular predicted epidemic behavior quite well. This measure, known as “betweenness centrality,” harkens back to previous epidemiological interest in bridging individuals. A person with high betweenness lies on many of the shortest paths that connect all individuals in a network. When a person bridges two distinct subpopulations, he or she typically has high betweenness because all paths from individuals in one cluster have to pass through this person to get to the other cluster, and vice-versa. As a population moves from a condition of very high community structure to a more moderate level, the number of people with high betweenness increases. This highlights a particularly interesting contrast with previous models: epidemics are more likely and larger in populations with highly unequal distributions of contacts on the one hand, but also in populations with more equal betweenness.

With the information that betweenness predicts the extent of epidemic spread in populations with community structure, we sought a means to use such information to design intelligent control measures. How do you find people who have high betweenness? As abstract as the concept of betweenness may seem, it turns out to not be that difficult. We start with an infected person and do standard contact tracing. That is, we ask the index case about his or her contacts. Contact tracing is one of the most important tools in the toolkit of the gumshoe epidemiologist. From the index case’s contacts, we pick a random individual and trace his or her contacts. Picking a random individual from this second generation of contact traces, we simply ask "do you know the index case?" If so, we keep going: trace the contacts of a random contact, ask again if this person knows the index case. When we come to an individual who does not know the index case, we have found our bridge. It is the penultimate person in the chain – the person who links the index case to someone he or she doesn’t know. Basically, we do a “random walk” on the social network looking for people who link otherwise unconnected individuals. When we find the bridge, we vaccinate all of his/her contacts. We call our vaccination algorithm the “Community Bridge Finder” (CBF).

When we vaccinate according to this algorithm, we reduce the final size of the epidemic far more than randomly vaccinating the same fraction of people. More interestingly, CBF also does better than the other vaccination algorithm that uses only local network information typically available to epidemiological investigators. This algorithm, known as the “Acquaintance Method,” vaccinates a randomly selected contact of an index case. The idea behind the acquaintance method is that the contacts of a case are more likely than chance to be highly connected individuals themselves in a population with heterogeneous contacts. That is, given that you have a contact, you’re on average more likely to be connected to a hub than to someone with few connections because hubs simply have more connections.

Of course, the way that we constructed our contact networks, we stacked the deck against the acquaintance method. Remember, everyone has the same number of contacts; what varies is how many contacts are within versus between communities. One of the great limiting factors for progress in social network analysis – and network epidemiology in particular – is the paucity of detailed network data from well-defined human populations. A domain that has garnered a lot of interest recently is the analysis of networks created by social media such as Facebook and Twitter. We used data from Facebook when its use was still restricted to particular college campuses to provide networks on which infections could pass. Facebook users typically have many contacts, probably way more than people have in epidemiologically relevant networks. However, because the data come from college acquaintance networks, we were able to prune the networks down toward something hopefully more epidemiologically appropriate. We kept contacts in the networks only if two individuals shared one a several key attributes such as shared dorm or major. What this yielded were a series of networks with heterogeneous contact structure and quite a bit of community structure (the measure of community structure hovered near the values where epidemics transitioned from small to large in our simulated networks). Once again, CBF outperformed the acquaintance method. This provided very strong evidence that community structure really matters for epidemic behavior and that exploiting information on community structure allows us to better control outbreaks of infectious disease.

notes on human ecology, population, and infectious disease