Marcel Salathé and I have a brand new paper out in today's issue of the Public Library of Science, Computational Biology. There is also a news piece by Adam Gorlick in the Stanford Report this morning. This is an idea I've been bouncing around for a few years now and I was very fortunate to have Marcel – and his programming wizardry – show up with an interest in the very same topic just at the right time. It's not every day that one of the most talented young theoretical biologists in the world shows up at your office wanting to collaborate. If it ever happens to you, I suggest you act!

The fundamental question is: Does social structure affect that course of epidemics? The answer seems obvious, particularly for infectious diseases that are transmitted by direct person-to-person contact. However, specific work demonstrating the effects of social structure on epidemics can be hard to find. Part of the problem, of course, is that you can hardly do experiments in which you change social structure and then subject populations to an infectious disease. To overcome this ethical and practical barrier to research, epidemiologists, biologists, and social scientists interested in disease and human behavior use mathematical and computational models to study how changes in host behavior affect the outcome of simulated epidemics.

Two specific topics that clearly have some bearing on social structure have been investigated extensively: individual heterogeneity in contact number and individual assortativeness. Epidemic behavior in all but the simplest models has been seen as being driven by heterogeneity. When there is a lot of variance in the number of potentially infectious contacts that individuals in a population have, epidemics are more likely, they infect large segments of the population more quickly, and ultimately infect a larger fraction of the total population. Consider the extreme case where all members of a population have one contact except for one person, who has a contact with everyone else. If we were to draw a picture of such a contact network, it would resemble a star or a wheel with a central hub and spokes:

Infect any random individual on this star and everyone else is at risk for infection. At the opposite extreme, if everyone has exactly one contact, then a randomly infected person can infect, at most, one other individual.

The aggregate effects of individual behavioral decisions can have a profound effect on the shape and composition of human populations, but there is more to human populations than simply individual behavior. For one thing, human populations are characterized by a hierarchical structure: individuals typically belong to households and households are aggregated into communities, which are, in turn, aggregated in towns, states, nations, etc. Naturally, there are cross-cutting ties in such hierarchical organization (much like bridges in individual contact networks). Freudian fantasies of primitive hordes aside, even the largely egalitarian societies of hunter-gatherers are characterized by a hierarchical structuring of families, bands, and tribes. Hierarchical structuring is clearly important for understanding social process in human societies.

So what effect does such community structure have on epidemics? To address this question, Marcel and I combined the formalisms of social network analysis and computational models of epidemics. We already know that heterogeneity in contact number can have profound effects on the outcomes of epidemics and that such heterogeneity can change aggregate social structure in complex ways. To avoid such complications, we generated networks where every individual had the exact same number of contacts. The only thing that varied in these toy networks was the likelihood that any randomly chosen connection between two individuals would be either within or between more or less cohesive subgroups (a.k.a., “communities”). Using metrics derived from Graph Theory, the branch of discrete mathematics that provides the basic tools for Social Network Analysis, we were able to characterize the degree of community structure and relate this to the outcome of epidemics simulated on the resulting networks.

It turns out that community structure has an enormous effect on epidemic outcome. In particular, we found that there is a remarkably abrupt transition from small outbreaks to very large outbreaks as we moved from the most structured populations to more moderately structured ones. Populations characterized by extreme community structure have smaller outbreaks because the infection has a hard time getting out of a community before dying out. As more connections to other communities are made – i.e., the community structure is lessened – there are more opportunities for the infection to escape and affect a larger fraction of the total population. While the result sounds intuitively satisfying after the fact, there was little precedent for expecting such an outcome in the mathematical theory of epidemics. This is because none of the standard metrics of an infectious disease – the basic reproduction ratio, in particular – changed as the populations’ community structure changed.

When we investigated the further structural network correlates of epidemic size, we found that one measure in particular predicted epidemic behavior quite well. This measure, known as “betweenness centrality,” harkens back to previous epidemiological interest in bridging individuals. A person with high betweenness lies on many of the shortest paths that connect all individuals in a network. When a person bridges two distinct subpopulations, he or she typically has high betweenness because all paths from individuals in one cluster have to pass through this person to get to the other cluster, and vice-versa. As a population moves from a condition of very high community structure to a more moderate level, the number of people with high betweenness increases. This highlights a particularly interesting contrast with previous models: epidemics are more likely and larger in populations with *highly unequal* distributions of contacts on the one hand, but also in populations with *more equal* betweenness.

With the information that betweenness predicts the extent of epidemic spread in populations with community structure, we sought a means to use such information to design intelligent control measures. How do you find people who have high betweenness? As abstract as the concept of betweenness may seem, it turns out to not be that difficult. We start with an infected person and do standard contact tracing. That is, we ask the index case about his or her contacts. Contact tracing is one of the most important tools in the toolkit of the gumshoe epidemiologist. From the index case’s contacts, we pick a random individual and trace his or her contacts. Picking a random individual from this second generation of contact traces, we simply ask "do you know the index case?" If so, we keep going: trace the contacts of a random contact, ask again if this person knows the index case. When we come to an individual who does not know the index case, we have found our bridge. It is the penultimate person in the chain – the person who links the index case to someone he or she doesn’t know. Basically, we do a “random walk” on the social network looking for people who link otherwise unconnected individuals. When we find the bridge, we vaccinate all of his/her contacts. We call our vaccination algorithm the “Community Bridge Finder” (CBF).

When we vaccinate according to this algorithm, we reduce the final size of the epidemic far more than randomly vaccinating the same fraction of people. More interestingly, CBF also does better than the other vaccination algorithm that uses only local network information typically available to epidemiological investigators. This algorithm, known as the “Acquaintance Method,” vaccinates a randomly selected contact of an index case. The idea behind the acquaintance method is that the contacts of a case are more likely than chance to be highly connected individuals themselves in a population with heterogeneous contacts. That is, given that you have a contact, you’re on average more likely to be connected to a hub than to someone with few connections because hubs simply have more connections.

Of course, the way that we constructed our contact networks, we stacked the deck against the acquaintance method. Remember, everyone has the same number of contacts; what varies is how many contacts are within versus between communities. One of the great limiting factors for progress in social network analysis – and network epidemiology in particular – is the paucity of detailed network data from well-defined human populations. A domain that has garnered a lot of interest recently is the analysis of networks created by social media such as Facebook and Twitter. We used data from Facebook when its use was still restricted to particular college campuses to provide networks on which infections could pass. Facebook users typically have many contacts, probably way more than people have in epidemiologically relevant networks. However, because the data come from college acquaintance networks, we were able to prune the networks down toward something hopefully more epidemiologically appropriate. We kept contacts in the networks only if two individuals shared one a several key attributes such as shared dorm or major. What this yielded were a series of networks with heterogeneous contact structure and quite a bit of community structure (the measure of community structure hovered near the values where epidemics transitioned from small to large in our simulated networks). Once again, CBF outperformed the acquaintance method. This provided very strong evidence that community structure really matters for epidemic behavior and that exploiting information on community structure allows us to better control outbreaks of infectious disease.