clonalExpansion

Scientific Computing

Dynamic simulation modelling and Bayesian inference for models of clonal Expansion

Comparing Simulation Speeds in Python, Julia and R

How quickly can we simulate using the Gillespie algorithm?

There currently exist a vast array of programming languages to choose from. Two important considerations when choosing a language are the speed of the language, i.e. how long will it take to run a certain chunk of code relative to other programming languages? The second one is its ease of use, if a language has an intuitive syntax it can be far quicker to write working code in that language. Commonly languages such as C are used for writing extremely efficient, fast-running code, the downside of this is that it can be a tricky language to write in. On the other side there are languages such as Python which are relatively easier to learn but do not offer the same speeds. The Julia programming language attempts to bridge this gap by being designed to be intuitive to use as well as being relatively faster compared to similar scientific programming languages such as Python or R.

The Gillespie algorithm is a commonly used tool to simulate stochastic processes. In this case I am using it to simulate the clonal expansion of mitochondrial DNA (mtDNA). The details of the model being used are beyond the scope of this blog post.

The Gillespie algorithm is a useful way to benchmark the three languages as by its very nature it must include some form of iterative loop, typically in the form of a ‘for’ or ‘while’ statement. The purpose of this exercise is to show which of these languages performs the best in a task that cannot be sped up by vectorisation or parallelisation.

this is an image

The above stripchart shows that Julia is the fastest language by a factor of about 3, and that Python is the slowest language. The black dots represent the mean time for each of the three languages.

To make the comparison between the comparison between the three languages as fair as possible, the algorithm was coded in as similar way as possible in all three of the languages.

function Gillespie_new(N, Tmax)
    tt = 0
    x = N["M"]
    S = transpose(N["Post"]-N["Pre"])
    v = size(N["Pre"],1)
    while tt <= Tmax
        h = N["h"](x, tt)
        a = Exponential(1/sum(h))
        tt = tt + rand(a, 1)[1]
        j = wsample([(i) for i=1:v],h,1)
        x = x + S[:,j]
    end
    return(Dict([("t",tt),("x",x)]))
end

Above: the code for simulating from the Gillespie algorithm in Julia.

gillespie_new = function (N, Tmax, ...) 
{
  tt = 0
  x = N$M
  S = t(N$Post - N$Pre)
  v = ncol(S)
  while (tt<=Tmax) {
    h = N$h(x, tt, ...)
    tt = tt + rexp(1, sum(h))
    j = sample(v, 1, prob = h)
    x = x + S[, j]
  }
  return(list(t = tt, x = x))
}

Above: the code for simulating from the Gillespie algorithm in R.

def gillespie_new(N, Tmax):
    tt = 0
    x = N["M"]
    S = (N["Post"]-N["Pre"]).transpose()
    v = np.shape(S)[1]
    while (tt <= Tmax):
        h = N["h"](x, tt)
        h_norm = h/sum(h)
        tt += np.random.exponential(scale=1/sum(h), size=1)[0]
        j = np.random.choice(v, size=1, replace=False, p=h_norm)[0]
        x += S[:,j]
    return({"t":tt,"x":x})

Above: the code for simulating from the Gillespie algorithm in Python.

Having benchmarked the three languages using a fairly simple model, I repeated the analysis using a more complex model. The mean times came out to be: 0.243s for Julia, 0.549s for R, and 1.629s for Python. The ratios of speeds between the languages hasn’t significantly changed. It is also interesting to note that the although the matrices involved in the Gillespie algorithm are far larger, the matrix operations carried out (e.g. matrix transpose) are not particularly computationally intensive tasks that have probably been well optimised in each of the languages. On the other hand, if, for example, calculating a matrix determinant was a necessary step in the Gillespie algorithm a far larger slow down could be expected with the more complex model.

Ultimately, what conclusions can we draw from this data? In this case, Julia was the fastest language, a factor of between 2 and 3 faster than R, which in turn was quicker than Python. Whilst obviously being faster is better, this isn’t the orders of magnitude speed increase that can often be seen when benchmarking Python and R against C. However, this is obviously a very limited test. Many others have found Julia to be significantly faster than many other languages over a range of tests suggesting that its ease of use and speed means it has great potential as a powerful and widely used programming language.