How do we evaluate the quality of published work? This has become an issue for me recently for one general and two more specific reasons. The general reason is that as one approaches one's tenure decision, one tends to think about the impact of one's oeuvre. The specific reasons are, first, I have a paper that I know has been read (and used) by a substantial number of people but was published in a journal (The Journal of Statistical Software) that is not indexed by Thompson Scientific, the keepers of the impact factor. Will this hurt me or any of the other people who write useful and important software (and perform all the research entailed in creating such a product) when I am evaluated on the quality of my work? The second reason this question has taken on relevance for me is that I am an Associate Editor of PLoS ONE, another journal that is not indexed by Thompson. One of my duties as an AE is to encourage people to submit high-quality papers to PLoS ONE. This can be tricky when people live and die by a journal's impact factor.
The thing that irks me about Thompson's impact factors is how opaque they are. Thompson doesn't have to answer to anyone, so they are free to do whatever they want (as long as people continue to consume their products). Why do some journals get listed and others don't? What constitutes a "substantive paper" (the denominator for the impact factor calculation)? What might the possible confounds be? What about biases? We actually know quite a bit about these last two. We know very little about the first two.
Moyses Szklo has a nice brief editorial in the journal Epidemiology, describing a paper in that same journal by Miguel Hernán criticizing the use of impact factors in epidemiology. The points clearly apply to science more generally. Three key isues affecting a journal's impact factor listed by Szklo are: (1) the frequency of self-citation, (2) the proportion of a journal's articles that are reviews (review papers get cited a lot), and (3) the size of the field being served by the journal. Hernán's paper is absolutely marvelous. He notes that the bibliographic impact factor (BIF) is flawed -- as a statistical measure, not by the manipulations described by Szklo -- for three reasons: (1) a bad choice of denominator (total number of papers published), (2) the need to adjust for variables that are known to affect the measure, (3) the questionability of the mean as a summary measure for highly skewed distributions (as we know BIFs have). Hernán makes his case by presenting a parallel case of a fictional epidemiological study. To anyone trained in epidemiological methods, this case is clearly flawed. It is exactly analogous to the way that Thompson calculates BIFs, yet we continue to use them. The journal, Epidemiology, also published a number of interesting responses to Hernán's paper criticizing the use of BIFs (Rich Rothenberg, social network epidemiologist-extraordinaire has a nice counterpoint essay to these). The irony is that on the Epidemiology front page, they advertise the journal by touting its impact factor!
The rub, of course, is that formulating a less flawed metric of intellectual impact is clearly a very demanding task. Michael Jenson, of the National Academies, has written The New Metrics of Scholarly Authority. One of the key concepts is devising a metric that measures quality at the level of the paper rather than the level of the journal. We've all seen fundamentally important papers that, for whatever reason, get published in obscure journals. Similarly, we regularly see the crap that comes out in high-prestige journals like Science, Nature, and PNAS every week! Pete Binfield, the managing editor of PLoS ONE notes that Jenson's ideas are very difficult to implement. Pete is leading the way for PLoS to think about alternative metrics like the number of downloads, the number of ping-backs from relevant (uh-oh, more subjectivity!) blogs, number of bookmarks on social bookmark pages, etc. Another way to handle Thompson's monopoly is to use alternative metrics such as those created by Scopus or Google Scholar. This last suggestion, while worth pursuing in the spirit of competition, is still not entirely satisfying because to whom in Science do these organizations have to answer? I am particularly leery of Scopus because it is run by Elsevier, a big for-profit publishing house that also clearly has it's own agenda. PubMed is, at least, public and for the public benefit. Of course, they don't index all journals either -- not too many Anthropology journals indexed there!
Björn Brembs, another PLoS ONE AE, makes the very reasonable suggestion that an impact factor should, at the very least, be a multivariate measure (in accordance with the criticism of lack-of-adjustment for confounders in Hernán's essay). Björn, in another blog posting, cites a paper published last year in PLoS ONE that I have not yet read, but clearly need to. This paper shows that BIF inconsistently ranks journals in terms of impact (largely because the mean is such a poor measure for citation distributions) and proposes a more consistent measure. I need to carve some time out of my schedule to read this one carefully.