Log-log plots in R

Plotting data on log-log scale can be helpful when your data cover a large range of values and/or to detect power law behaviors in your data. Create such a plot in R is quite simple, you just have to add log = “xy” as parameter when you call the plot function. In the same way, you can use log = “x” or log = “y” to plot only the x-axis or the y-axis on log scale.

For example, we can plot the histogram of a sample X on a log-log scale. First, we use hist to compute the histogram of X with 500 bins. Then we extract the midpoint x and the density y of each histogram cell. Finally, we plot the result on a log-log scale (log = “xy”). I prefer to label the axis using exponents rather than scientific notation so I use the function eaxis (library sfsmisc). Here is the code:

h=hist(X,breaks=500,plot=FALSE)
x=h$mids
y=h$density

par(mar=c(5,6,2,2))
plot(x,y,log="xy",pch=16,type="p",cex=2,col="steelblue3",xlab="",ylab="",las=1,xaxt="n",yaxt="n"))
mtext(quote(X),1,line=3,cex=2)
mtext(quote(P(X)),2,line=4,cex=2)
eaxis(1,at=c(1,10,10^2,10^3,10^4),cex.axis=1.5)
eaxis(2,at=c(10^-9,10^-8,10^-7,10^-6,10^-5,10^-4,10^-3,10^-2,10^-1,10^0),cex.axis=1.5)

And the result:

The result is quite nice but the tail of the distribution is very noisy, there are probably few values in the last bins. To avoid that we need to bin the data into exponentially wider bins so that they appear equally spaced on the logarithmic scale. The idea is to compute the breakpoints between histogram cells with equally spaced values in log scale and then apply the exp function to obtain the breakpoints in the original scale. The updated code:

brea=seq(0.1,log(10000),0.6)
brea=c(0,exp(brea))
h=hist(X,breaks=brea,plot=FALSE)
x=h$mids
y=h$density

par(mar=c(5,6,2,2))
plot(x,y,log="xy",pch=16,type="p",cex=2,col="steelblue3",xlab="",ylab="",las=1,xaxt="n",yaxt="n"))
mtext(quote(X),1,line=3,cex=2)
mtext(quote(P(X)),2,line=4,cex=2)
eaxis(1,at=c(1,10,10^2,10^3,10^4),cex.axis=1.5)
eaxis(2,at=c(10^-9,10^-8,10^-7,10^-6,10^-5,10^-4,10^-3,10^-2,10^-1,10^0),cex.axis=1.5)

And the new plot:

Now the tail of the distribution is less messy and we can clearly identify a truncated power law distribution with probably an exponential cutoff. To check that we need to fit the truncated power law’s PDF to the data. The PDF of a truncated power law distribution with an exponential cutoff is:

So now we need to adjust the three parameters, \( C \) the normalizing constant, \( \mathbf{\beta} \) the power law exponent and \( x_o \) the exponential cutoff to the data. To do so we define a function TPL which computes the residual sum of square between the data y and f(x).

TPL=function(par,x,y){
    C=par[1]
    beta=par[2]
    xo=par[3]
    
    est=(C*x^(-beta))*exp(-x/xo)
    sum((log(y) - log(est))^2)    
}

Then, we use the function optim which returns the parameter values par minimizing TPL.

f=optim(par=c(0.1,0.9,700),TPL,x=x,y=y)

Finally, we can plot the data and the fitted curve. As you can see the fit looks reasonable. Of course to do things right it would be necessary to have a more rigorous process for estimating the parameters with the maximum likelihood method for example. It would also be important to test different distributions and compare them with a test and/or a goodness-of-fit. For those who are interested in fitting power laws to data I recommend the paper of Clauset et al.

x1=c(seq(0.1,1,0.1),seq(1,10000,1))
y1=(f$par[1]*x1^(-f$par[2]))*exp(-x1/f$par[3])

par(mar=c(5,6,2,2))
plot(x,y,log="xy",pch=16,type="p",cex=2,col="steelblue3",xlab="",ylab="",las=1,xaxt="n",yaxt="n",bty="n",xlim=c(min(x),max(x)),ylim=c(min(y),max(y)))
par(new=T)
plot(x1,y1,log="xy",type="l",lwd=3,col="#CC6666",xlab="",ylab="",las=1,xaxt="n",yaxt="n",xlim=c(min(x),max(x)),ylim=c(min(y),max(y)))
mtext(quote(X),1,line=3,cex=2)
mtext(quote(P(X)),2,line=4,cex=2)
eaxis(1,at=c(1,10,10^2,10^3,10^4),cex.axis=1.5)
eaxis(2,at=c(10^-9,10^-8,10^-7,10^-6,10^-5,10^-4,10^-3,10^-2,10^-1,10^0),cex.axis=1.5)
comments powered by Disqus