plotting | monkey's uncle

One common frustration that I have heard expressed about R is that there is no automatic way to plot error bars (whiskers really) on bar plots. I just encountered this issue revising a paper for submission and figured I'd share my code. The following simple function will plot reasonable error bars on a bar plot.

[r]
error.bar <- function(x, y, upper, lower=upper, length=0.1,...){ if(length(x) != length(y) | length(y) !=length(lower) | length(lower) != length(upper)) stop("vectors must be same length") arrows(x,y+upper, x, y-lower, angle=90, code=3, length=length, ...) } [/r] Now let's use it. First, I'll create 5 means drawn from a Gaussian random variable with unit mean and variance. I want to point out another mild annoyance with the way that R handles bar plots, and how to fix it. By default, barplot() suppresses the X-axis. Not sure why. If you want the axis to show up with the same line style as the Y-axis, include the argument axis.lty=1, as below. By creating an object to hold your bar plot, you capture the midpoints of the bars along the abscissa that can later be used to plot the error bars. [r] y <- rnorm(500, mean=1) y <- matrix(y,100,5) y.means <- apply(y,2,mean) y.sd <- apply(y,2,sd) barx <- barplot(y.means, names.arg=1:5,ylim=c(0,1.5), col="blue", axis.lty=1, xlab="Replicates", ylab="Value (arbitrary units)") error.bar(barx,y.means, 1.96*y.sd/10) [/r]
Now let's say we want to create the very common plot in reporting the results of scientific experiments: adjacent bars representing the treatment and the control with 95% confidence intervals on the estimates of the means. The trick here is to create a 2 x n matrix of your bar values, where each row holds the values to be compared (e.g., treatment vs. control, male vs. female, etc.). Let's look at our same Gaussian means but now compare them to a Gaussian r.v. with mean 1.1 and unit variance.

[r]
y1 <- rnorm(500, mean=1.1) y1 <- matrix(y1,100,5) y1.means <- apply(y1,2,mean) y1.sd <- apply(y1,2,sd) yy <- matrix(c(y.means,y1.means),2,5,byrow=TRUE) ee <- matrix(c(y.sd,y1.sd),2,5,byrow=TRUE)*1.96/10 barx <- barplot(yy, beside=TRUE,col=c("blue","magenta"), ylim=c(0,1.5), names.arg=1:5, axis.lty=1, xlab="Replicates", ylab="Value (arbitrary units)") error.bar(barx,yy,ee) [/r]

Clearly, a sample size of 100 is too small to show that the means are significantly different. The effect size is very small for the variability in these r.v.'s. Try 10000.

[r]
y <- rnorm(50000, mean=1) y <- matrix(y,10000,5) y.means <- apply(y,2,mean) y.sd <- apply(y,2,sd) y1 <- rnorm(50000, mean=1.1) y1 <- matrix(y1,10000,5) y1.means <- apply(y1,2,mean) y1.sd <- apply(y1,2,sd) yy <- matrix(c(y.means,y1.means),2,5,byrow=TRUE) ee <- matrix(c(y.sd,y1.sd),2,5,byrow=TRUE)*1.96/sqrt(10000) barx <- barplot(yy, beside=TRUE,col=c("blue","magenta"), ylim=c(0,1.5), names.arg=1:5, axis.lty=1, xlab="Replicates", ylab="Value (arbitrary units)") error.bar(barx,yy,ee) [/r]

That works. Maybe I'll show some code for doing power calculations next time...

Using the network package, you can plot graphs in a flexible and powerful way. Often, when plotting a network, we want to vary the color, size, or shape of the vertices based on some attributes. Let's say that we have a freewheeling sexual network (easier to simulate) and we would like to color the vertices of the graph according to their HIV sero-status. Let's also say that we want to make the shape of each vertex reflect the sex of the individual. We use the following code:

[r]
# begin with randomly constructed edgelist
set.seed(12345)

n1 <- round(1+10*runif(50)) n2 <- round(1+10*runif(50)) eee <- cbind(n1,n2)[order(n1),] net <- network(eee,directed=FALSE) # this will be a dense network! hiv <- rbinom(50,size=1,prob=0.2) # random infections! sex <- rbinom(50,size=1,prob=0.5) # random sexes! set.vertex.attribute(net,"hiv",hiv+1) set.vertex.attribute(net,"sex",sex+3) ## now plot plot(net, vertex.col="hiv", vertex.sides="sex", vertex.cex=5, vertex.rot=-30, edge.lwd=1) [/r] I definitely wouldn't want to be part of that party.

monkey's uncle

Tag Archives: plotting

Plotting Error Bars in R

notes on human ecology, population, and infectious disease