## Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) is a general method to estimate probability distributions and can therefore be applied to a very wide range of problems. Here we take as an example the fitting of dust emission spectra.

Multi-wavelength observations in the far-infrared / submillimetre regime are typically fitted with a *modified blackbody spectrum*. The model has three free parameters: intensity, temperature, and spectral index. These can be further related to the dust optical depth or column density, but here were are looking only at the fitting problem *I(f) = I(f0) B(T)* (*f/f0*)^*β*. We have observations of intensity *I(f)* at least on three frequencies *f *and we examine the modified blackbody spectrum relative to the intensity *I(f0)* at some reference frequency *f0*. Above *B(T)* stands for the Planck function, which depends on the temperature *T*, and the spectrum is further modified by the (opacity) spectral index *β*. Given observations *I(f)* and their uncertainties *dI(f)*, our task is to estimate the values of the free parameters *I(f0)*, *T*, and *β*… and estimate their probability distributions.

Below I test four implementations of the MCMC method and compare the run times with the χ^2 fits that are done with python and the scipy library. Unlike χ^2 fits, MCMC provides complete probability distributions for the fitted parameters but could be correspondingly slower. However, here we implement MCMC using OpenCL and more specifically the Python interface pyOpenCL, which makes it possible to run the calculations on GPU as well as on CPU. The four MCMC version are

- Basic Metropolis algorithm. The parameter vector is updated with random steps generated from normal distribution and only the overall step size is adjusted according to the acceptance rate.
- Propositions for the steps are generated from three-dimensional normal distribution (corresponding to the three free parameters). The covariance matrix of the multivariate normal distribution is updated based on past steps. This adaptive Metropolis (AM) method was first described in Haario et al. 2001.
- Hamiltonian (or Hybrid) Monte Carlo (HMS) as explained in Neal 2011.
- Robust Adaptive Metropolis (RAM) algorithm described in Vihola 2011.

These are not necessarily robust implementations (they may be sensitive to the initial parameter values and do not make checks for convergence etc.) but they should give some idea of the relative performance. The running of the test script plots an example of the fitting of one spectrum:

The first frame shows the individual samples in the (T, β) plane where the input values for the spectrum simulation were T=15.0K and β=1.8. The colours black, blue, red, and green correspond to the four MCMC versions described above. The middle frames show pieces of the MCMC chains (no thinning, towards the end of 10000 step chains). The basic Metropolis has lower acceptance rate, in spite of the adjusted step size and the fact that the relative step sizes along the three coordinate axes were first selected manually. The same is visible in the last plot of the (normalised) autocorrelation functions.

We next compare the run times to direct chi2 minimisation with a python routine, to see if the knowledge of the posterior probability distributions are worth the extra computations. The run times for 10000 spectra and with 10000 MCMC steps each are the following:

chi2 minimisation 95.4 seconds Metropolis 7.3 seconds AM 7.3 seconds HMS 229.0 seconds RAM 6.9 seconds

Apart from the third version, MCMC calculations (with the selected number of steps) are all faster than the python optimisation. A least squares routine would be less general; it would be faster but still not as fast as the fastest MCMC versions. HMS is slow because of the loop in the generation of the propositions for the steps.

Because the script uses OpenCL, we can run it also on GPU, simply by changing one parameter. Going from CPU (2.9GHz i7-4910MQ) to a GPU (AMD R9-290M) the run times become

Metropolis 1.8 seconds AM 1.4 seconds HMS 9.7 seconds RAM 1.3 second

The results could change with tuning of the parameters. AM and RAM perform very similarly, both regarding the run time and the effective sample sizes. HMS sees the largest speed-up on GPU but remains slow in comparison. The problem is also too simple for the benefits of Hamiltonian MCMC to be noticeable. The HMC run time on CPU may also have been affected by CPU throttling, the test being run on a laptop. Basic Metropolis is fast but the samples are badly correlated. In constrast, RAM worked well without any tunable parameters.