Plotting with R

In writing my signal error correction post (which I had saved as a draft and have since lost), I was looking for a nice diagram of a probability distribution function. I found this one on wikipedia:

Wikipedia probability distribution diagram

(This diagram is made by Mwtoews and is licensed under CC BY 2.5).

This wasn’t quite what I wanted though. I was looking for a probability distribution for positive standard distributions; a probability distribution starting at zero. Luckily, I found some R code on the Wikimedia page, and set out to tweak it, ultimately producting this diagram:

Modified probability distribution diagram

I’m gonna give a breakdown of what the Wikipedia code did, and how I modified it to suit my needs. Let’s jump right in!


svg(filename = "diagram.svg", width = 7, height = 3.5)
par(mar = c(2, 2, 0, 0))

I changed the filename here, for brevity.

This section of code is simple but tells us something important about R: There’s a script global scope. We call some command here to create a SVG, and then later we add to it and eventually write to a file.

This is a nifty feature of R scripts (and more generally, a lot of scripting languages). It keeps things short and simple.

par(mar = c(2, 2, 0, 0))

The par function sets parameters of a plot, and mar represents the margins. Here we’re setting 2 “lines” of margin on the bottom and left.

Here there’s a c function at work. From the docs:

This is a generic function which combines its arguments. The default method combines its arguments to form a vector

This is another cornerstone of R. Vectors, or one dimensional arrays. This is used a lot throughout this code sample; we’ll definitely see more of it.

cols <- c("#2171B5", "#6BAED6", "#BDD7E7", "#EFF3FF")

Some shades of blue, light to dark, for us to use later. This also uses the c function to create a vector.

x <- seq(-4, 4, 0.1)

Can you guess the return type of this? That’s right, a vector! R really like their vectors.

plot(x, type="n", xaxs="i", yaxs="i", xlim=c(-4, 4), ylim=c(0, 0.4),
     bty="l", xaxt="n", xlab="", ylab="")

We can see another cool R feature here: The preference of named parameters. Named parameters are clean and concise. Reading this code we might not know yet what "i" and "n" and "l" stand for, but we can clearly tell that they are settings for the type of the plot, the axes, and some bty parameter.

Using some online documentation, we can piece together what these parameters mean:

# Function to plot each coloured portion of the curve, between "a" and "b" as a
# polygon; the function "dnorm" is the normal probability density function
polysection <- function(a, b, col, n=11){
    dx <- seq(a, b, length.out=n)
    polygon(c(a, dx, b), c(0, dnorm(dx), 0), col=col, border=NA)
    # draw a white vertical line on "inside" side to separate each section
    segments(a, 0, a, dnorm(a), col="white")
}

# Build the four left and right portions of this bell curve
for(i in 0:3){
    polysection(     i, i + 1, col=cols[i + 1])  # Right side of 0
    polysection(-i - 1,    -i, col=cols[i + 1])  # Left right of 0
}

# Black outline of bell curve
lines(x, dnorm(x))

# Bottom axis values, where sigma represents standard deviation and mu is the mean
axis(1, at=-3:3, labels=expression(mu - 3 * sigma, mu - 2 * sigma, mu - sigma, mu,
                                   mu + sigma, mu + 2 * sigma, mu + 3 * sigma))

# Add percent densities to each division (rounded to 1 decimal place), between x and x+1
text(c((0:3) + 0.5, (0:-3) - 0.5), c(0.16, 0.05, 0.04, 0.02),
     sprintf("%.1f%%", 100 * (pnorm(1:4) - pnorm(0:3))),
     col=c("white", "white", "black", "black"))
segments(c(-2.5, -3.5, 2.5, 3.5), dnorm(c(2.5, 3.5)),
         c(-2.5, -3.5, 2.5, 3.5), c(0.03, 0.01))

dev.off()