social networks | monkey's uncle

Marcel Salathé and I have a brand new paper out in today's issue of the Public Library of Science, Computational Biology. There is also a news piece by Adam Gorlick in the Stanford Report this morning. This is an idea I've been bouncing around for a few years now and I was very fortunate to have Marcel – and his programming wizardry – show up with an interest in the very same topic just at the right time. It's not every day that one of the most talented young theoretical biologists in the world shows up at your office wanting to collaborate. If it ever happens to you, I suggest you act!

The fundamental question is: Does social structure affect that course of epidemics? The answer seems obvious, particularly for infectious diseases that are transmitted by direct person-to-person contact. However, specific work demonstrating the effects of social structure on epidemics can be hard to find. Part of the problem, of course, is that you can hardly do experiments in which you change social structure and then subject populations to an infectious disease. To overcome this ethical and practical barrier to research, epidemiologists, biologists, and social scientists interested in disease and human behavior use mathematical and computational models to study how changes in host behavior affect the outcome of simulated epidemics.

Two specific topics that clearly have some bearing on social structure have been investigated extensively: individual heterogeneity in contact number and individual assortativeness. Epidemic behavior in all but the simplest models has been seen as being driven by heterogeneity. When there is a lot of variance in the number of potentially infectious contacts that individuals in a population have, epidemics are more likely, they infect large segments of the population more quickly, and ultimately infect a larger fraction of the total population. Consider the extreme case where all members of a population have one contact except for one person, who has a contact with everyone else. If we were to draw a picture of such a contact network, it would resemble a star or a wheel with a central hub and spokes:

Infect any random individual on this star and everyone else is at risk for infection. At the opposite extreme, if everyone has exactly one contact, then a randomly infected person can infect, at most, one other individual.

Assortativeness, the tendency for individuals to associate with others like themselves, can either aid or hinder the spread of infections. People in contemporary nation states like the United States show an incredible capacity to form associations with like individuals. We form social relationships, particularly intimate relationships, with people who are similar to us in age, socioeconomic status, sexual orientation, ethnicity, education, religion, forms of deviance behavior such as drug use or criminal activity, etc. Frequently, this assortativeness has the effect of localizing and concentrating epidemiologically important contacts. When this happens, individuals who act as bridges between different communities take on central epidemiological importance. For example, married men who visit commercial sex workers can serve as a critical bridge connecting high-risk populations of sex workers and injection drug users with the general population. Similarly, health care workers can bridge hospital populations with the general population, a phenomenon important for the emergence of SARS in 2002. (Note that for epidemiological applications, we call such individuals “bridges” but in other applications we might call them “brokers” or “entrepreneurs,” highlighting the general importance of such ideas for understanding society.) The existence of such social bridges highlights the fact that people can also assort on characteristics that are not visible attributes and this type of assortative behavior can increase connectivity. In particular, if people with few contacts tend to be connected to people with many contacts (as in the case of the star), then such disassortativeness can increase the epidemic potential in a population.

The aggregate effects of individual behavioral decisions can have a profound effect on the shape and composition of human populations, but there is more to human populations than simply individual behavior. For one thing, human populations are characterized by a hierarchical structure: individuals typically belong to households and households are aggregated into communities, which are, in turn, aggregated in towns, states, nations, etc. Naturally, there are cross-cutting ties in such hierarchical organization (much like bridges in individual contact networks). Freudian fantasies of primitive hordes aside, even the largely egalitarian societies of hunter-gatherers are characterized by a hierarchical structuring of families, bands, and tribes. Hierarchical structuring is clearly important for understanding social process in human societies.

So what effect does such community structure have on epidemics? To address this question, Marcel and I combined the formalisms of social network analysis and computational models of epidemics. We already know that heterogeneity in contact number can have profound effects on the outcomes of epidemics and that such heterogeneity can change aggregate social structure in complex ways. To avoid such complications, we generated networks where every individual had the exact same number of contacts. The only thing that varied in these toy networks was the likelihood that any randomly chosen connection between two individuals would be either within or between more or less cohesive subgroups (a.k.a., “communities”). Using metrics derived from Graph Theory, the branch of discrete mathematics that provides the basic tools for Social Network Analysis, we were able to characterize the degree of community structure and relate this to the outcome of epidemics simulated on the resulting networks.

It turns out that community structure has an enormous effect on epidemic outcome. In particular, we found that there is a remarkably abrupt transition from small outbreaks to very large outbreaks as we moved from the most structured populations to more moderately structured ones. Populations characterized by extreme community structure have smaller outbreaks because the infection has a hard time getting out of a community before dying out. As more connections to other communities are made – i.e., the community structure is lessened – there are more opportunities for the infection to escape and affect a larger fraction of the total population. While the result sounds intuitively satisfying after the fact, there was little precedent for expecting such an outcome in the mathematical theory of epidemics. This is because none of the standard metrics of an infectious disease – the basic reproduction ratio, in particular – changed as the populations’ community structure changed.

When we investigated the further structural network correlates of epidemic size, we found that one measure in particular predicted epidemic behavior quite well. This measure, known as “betweenness centrality,” harkens back to previous epidemiological interest in bridging individuals. A person with high betweenness lies on many of the shortest paths that connect all individuals in a network. When a person bridges two distinct subpopulations, he or she typically has high betweenness because all paths from individuals in one cluster have to pass through this person to get to the other cluster, and vice-versa. As a population moves from a condition of very high community structure to a more moderate level, the number of people with high betweenness increases. This highlights a particularly interesting contrast with previous models: epidemics are more likely and larger in populations with highly unequal distributions of contacts on the one hand, but also in populations with more equal betweenness.

With the information that betweenness predicts the extent of epidemic spread in populations with community structure, we sought a means to use such information to design intelligent control measures. How do you find people who have high betweenness? As abstract as the concept of betweenness may seem, it turns out to not be that difficult. We start with an infected person and do standard contact tracing. That is, we ask the index case about his or her contacts. Contact tracing is one of the most important tools in the toolkit of the gumshoe epidemiologist. From the index case’s contacts, we pick a random individual and trace his or her contacts. Picking a random individual from this second generation of contact traces, we simply ask "do you know the index case?" If so, we keep going: trace the contacts of a random contact, ask again if this person knows the index case. When we come to an individual who does not know the index case, we have found our bridge. It is the penultimate person in the chain – the person who links the index case to someone he or she doesn’t know. Basically, we do a “random walk” on the social network looking for people who link otherwise unconnected individuals. When we find the bridge, we vaccinate all of his/her contacts. We call our vaccination algorithm the “Community Bridge Finder” (CBF).

When we vaccinate according to this algorithm, we reduce the final size of the epidemic far more than randomly vaccinating the same fraction of people. More interestingly, CBF also does better than the other vaccination algorithm that uses only local network information typically available to epidemiological investigators. This algorithm, known as the “Acquaintance Method,” vaccinates a randomly selected contact of an index case. The idea behind the acquaintance method is that the contacts of a case are more likely than chance to be highly connected individuals themselves in a population with heterogeneous contacts. That is, given that you have a contact, you’re on average more likely to be connected to a hub than to someone with few connections because hubs simply have more connections.

Of course, the way that we constructed our contact networks, we stacked the deck against the acquaintance method. Remember, everyone has the same number of contacts; what varies is how many contacts are within versus between communities. One of the great limiting factors for progress in social network analysis – and network epidemiology in particular – is the paucity of detailed network data from well-defined human populations. A domain that has garnered a lot of interest recently is the analysis of networks created by social media such as Facebook and Twitter. We used data from Facebook when its use was still restricted to particular college campuses to provide networks on which infections could pass. Facebook users typically have many contacts, probably way more than people have in epidemiologically relevant networks. However, because the data come from college acquaintance networks, we were able to prune the networks down toward something hopefully more epidemiologically appropriate. We kept contacts in the networks only if two individuals shared one a several key attributes such as shared dorm or major. What this yielded were a series of networks with heterogeneous contact structure and quite a bit of community structure (the measure of community structure hovered near the values where epidemics transitioned from small to large in our simulated networks). Once again, CBF outperformed the acquaintance method. This provided very strong evidence that community structure really matters for epidemic behavior and that exploiting information on community structure allows us to better control outbreaks of infectious disease.

I've been thinking some more about the issues that are raised by the debacle over Jared Diamond's 21 April 2008 New Yorker piece and the recent announcement of a lawsuit against him. There are many things to think about here. Probably foremost amongst these are the ethical concerns relating to preserving research subjects' privacy and informed consent. There are secondary concerns regarding scholarship, standards of research, and obligations to adequately describe research methodology.

I am troubled by a point raised by Alex Golub in the Savage Minds blog. Golub writes, "There is also a more serious problem with [Diamond's New Yorker] article which is also the most obvious thing about it: it contrasts ‘tribal societies’ with ‘modern state societies’. " This is something that bothers me too though I think that my response may be somewhat different than that of many contemporary cultural anthropologists. In general, I have sensibilities very much akin to Diamond's. I see tremendous value in comparative studies, and I think that there is something that we can call, for lack of a better term, a robust and fairly general Human Nature. Human beings are biological entities with material needs and (many) material motivations and we ignore these at our explanatory (and possibly literal) peril.

The Myth-of-Isolation criticism, which also arises in the Diamond debacle, is not new in Anthropology. I am reminded of the Kalahari Debate of Lee, Wilmsen and others. Globalization as a phenomenon of anthropological inquiry has certainly increased in currency of late and I think that this scholarship tends to make many of my colleagues skeptical of any research on, say, foraging decisions by hunting and gathering people. The answer to this criticism is that foraging people in a globalized world, like all people, still make decisions about what to eat, what not to eat, how to eat, etc. Their choices may be constrained by a hegemonic state or by extra-state organizations, but choices are still being made. Understanding how such choices are made in a globalized world strikes me as being at least as important as it was 50 or 100 years ago. This goes for hunter-gatherers as well as urban elites, agrarian peasants or just about anyone else.

Rather than taking labels such as "tribal" or "state" as sufficient descriptions of the differences between groups, I think that the science requires us to describe (and hopefully quantify) the dimensions of their difference. I have been thinking a lot about social networks lately. One dimension on which two societies might differ is the composition of ego networks. How many people does a given person know? What fraction of those are kin? What is the gender composition of the ego network? How socially similar are the member's of ego's network to him/herself? How many would provide emotional/economic/agonistic support to you in a crisis? Does an individual's ego network include socially important figures like government functionaries, doctors, lawyers or the equivalent? How much does ego's network overlap with his/her spouse's? Brother's? Neighbor's? Member of the next village/town? Gathering such data is clearly a major undertaking, but that's what science is about, no?

The fraught question of how to do ethical, meaningful anthropology in a globalized world that struggles with the legacy of colonial depredations has, in my view, driven too many anthropologists from science. Protecting human subjects and doing unto others what we would have done to us are important guiding principles for anthropological research, indeed, any research in the human sciences. Describing -- and, ultimately, understanding -- how societies differ and what the implications of these differences are for human behavior should, in my opinion, be another principle. Facile labels relating to social or economic complexity, ethnicity, religion, nationality, etc. do not help us understand the diversity of human behavior.

notes on human ecology, population, and infectious disease