LOC - Line radiative transfer with OpenCL

LOC is a program, or rather a family of programs, for the radiative transfer modelling of molecular lines. The main novelty is in the use of OpenCL libraries for the parallelisation of the computations so that they can be run on multi-core CPUs and GPUs. The basic concept is the same as in the case of SOC, which is a corresponding radiative transfer code for calculations of dust emission and scattering.

At this moment there exist versions of LOC for 1D spherically symmetric grids and for hierarchical 3D octree grids (see the ISM page on GitHub ). In addition to “normal” molecular line modelling, one can model spectra with hyperfine structure by assuming either local thermodynamic equilibrium between the components or (in the 1D program) by doing the full calculation, including the effects of line overlap.


Input files ^

Model cloud description ^

Cartesian 3D clouds

The 3D model is described by a separate binary file that defines (in the case of the Cartesian grid) the model dimensions and specifies for each cell seven quantities: the number density, the kinetic temperature, the amount of microturbulence (sigma), three components of the macroscopic velocity, and the fractional abundance. The file starts with the dimensions NX, NY, and NZ (three 4 byte integers), following by the data for the NX×NY×NZ cells, each consisting of the aforementioned seven numbers (4 byte floats; the data for a single cell being consecutive elements in the file).

The values in the file can be in physical units. For example, the density would be in units cm-3 and the microturbulence and velocities in km/s. Usually the density would be the density of H or of H2, the abundance being the abundance of the examined species relative to that density. However, the raw values read from the file may be rescaled using appropriate keywords in the parameter file.

Octree clouds

The octree cloud file starts with five integer (int32) values: NX, NY, NZ, OTL, CELLS. Here (NX, NY, NZ) are the dimensions of the root grid, OTL is the number of hierarchy levels in the octree structure, and CELLS is the total number of cells in the model. This is followed by the data for the root grid, followed by data for the first refinement level and so forth. On each hierarchy level, the file contains data for the same seven quantities as for the regular 3D clouds (density, Tkin, sigma, vx, vy, vz, abundance). Each parameter vector (containing one float32 number per each cell on that hierarchy level) is preceded by the number of cells on the hierarchy level. LOC follows the c-convention where the fastest changing index is called x. If one is working with Python arrays, there the last array index corresponds to the x axis and the first index to the z axis: array[z, y, x]. The overall file structure is

  cells_0, rho_0, cells_0, Tkin_0, cells_0, sigma_0, ...
  cells_1, rho_1, cells_1, Tkin_1, ...
  cells_N, rho_N, ..., cells_N, vz_n, cells_N, abundance_N

In the above cells_* are the number of cells per hierarchy level (int32 scalars) and the others (rho_*, Tkin_*, …) are vectors with those dimensions (e.g. rho_1[cells_1], with float32 values).

The actual grid hierarchy is contained in the rho_* vectors. When a cell is refined (divided into eight subcells) the density value rho_x[i] is replaced by value -(float *)&ind, when ind is the index of the first subcell in the vector rho_(x+1), i.e. the density vector of the next hierarchy level. In other words, one takes the subcell index ind, casts that as a float32 number, and stores it multiplied by -1 (to enable distinction between actual density values and rho_* entries that are links to sub-cells). For the other quantities (Tkin, sigma, etc.) the values in the parent cells (i.e. refined cells) are usually undefined and are not used by LOC.

Initialisation file ^

The following list will describe briefly the keywords used in the initialisation file (ini file). This is the file that is given as a command line parameter for the LOC programme. Each line in the parameter file starts with a keyword, possibly followed by one or more parameters separated by spaces. Comment lines in inline comments are indicated with a ‘#’ character.

  • abundance float scaling of the abundance values read from the model file

  • angle float apparent size of (root) grid cell in units of arcsec

  • bandwidth float total bandwidth used in the simulations and for the output spectra, in units if km/s

  • cabfile filename optional file giving the fractional abundance of each collisional partner. The file starts with NX, NY, NZ, and the number of collisional partners (four 4 byte integers). The rest of the file lists the abundances for every model cell, the values of a given cell being consecutive (index over partners runs faster than index over cells).

  • channels int number of velocity channels (in the simulation and the output spectra)

  • cloud filename name of the model file (with the density etc. values for every model cell)

  • cooling if given, the total cooling rate of the current species is calculated and saved to a file (one value per cell)

  • density float factor by which the density values read from the model file are multiplied before calculations

  • device string select the OpenCL device using a string that appears in the name of the device. When the keyword device is used, LOC will always print out the names of all available devices (see terminal output, strings after CPU device and GPU device").

  • direction float float [string] specifies the direction of the observer as angles (in degrees) from the positive Z axis and rotation from positive X towards Y. The optional third parameter is an additional string included in the name of the output files. For direction 90 0, the observer is in the direction of the positive X axis and the maps are oriented so that cloud Y axis is right and the Z axis up. For direction 0 0, observer is in the direction of positive Z axis and in the maps Y is right and X is down. For direction 90 90, the observer is in the direction of the positive Y axis and in the maps the cloud X axis points left and the cloud Z axis up.

  • distance float specifies the distance between the model and the observer, in units of pc

  • fraction float scaling applied to the abundance values read from the model file (synonym for keyword abundance)

  • gpu int for any value of int>0, the computations are run on a GPU instead of the CPU. See also keyword platform. The keyword device is an alternative way to select the exact platform and device to use.

  • grid float specifies the step between spectra in the output maps (~pixel size). The unit is arcsec.

  • isotropic float specifies the temperature (in degrees of Kelvin) for isotropic background radiation, assumed to have the spectrum of a blackbody.

  • iterations int specifies the number of iterations (consisting of the simulation of the radiation field and the updating of the level populations)

  • load filename at the beginning of the run, load previously saved level populations from the given binary file.

  • levels int specifies the number of lowest energy levels included in the computations (less or equal to the number of levels for which the molecule file contains data for)

  • lowmem turns on some optimisations to minimise the memory usage, at the expense of some increase in the run times

  • molecule filename specifies the name for the file containing the molecular data. The file can be in lamda format.

  • nside int specifies the angular grid for the simulated rays, which follows the healpix scheme. The total number of rays per model surface element is 12 times this value squared.

  • platform int (int) specifies the OpenCL platform to be used in the calculations. The optional second integer parameter selects a device within the platform (running index over all devices within the platform that are of the requested type, CPU or GPU). By default all available platforms are tried in order, choosing the first platform and the first device on the platform. Instead of the platform and gpu keywords, one can use just the device keyword.

  • points int int specifies the dimensions of the output spectrum files (the number of spectra along horizontal and vertical directions)

  • prefix string prefix for the output files

  • save filename gives the name for a binary file where the level populations are saved at the end of a run

  • sigma float scaling to be applied to the microturbulence value read from the file for the model description

  • spectra int int … specifies the transitions for which the spectra are to be computed and saved. The values refer to the running numbering of the transitions (in the order these appear in the molecule file).

  • stop float iterations are automatically stopped when the largest relative change in level populations fall below this limit. Only levels up to the one specified with the keyword uppermost are tested.

  • temperature float scalar scaling applied to the kinetic temperature values read from the model description file.

  • transitions int int … list of transitions (using running index of the transitions) for which the excitation temperatures are to be saved

  • uppermost int specifies the uppermost energy level that is checked for level population changes (see keyword stop)

  • velocity float scaling applied to the velocity values (three components if macroscopic velocity) that are read from the model description file

Molecular parameters ^

LOC reads directly files in the format of the Lamda database .

Output files ^

Spectrum files ^

The spectra are saved to binary files. Currently the file names are similar to prefix.molecule.01-00.bin, where prefix is given in the ini-file, molecule is replaced by the actual name of the molecule (species), and the number give the transition (based on the 0-offset numbering of energy levels). The file contains:

  • three 4-byte integers that give the number of pixels along the two map pixels (NX, NY), and the number of velocity channels (NCHN)
  • two 4-byte floating point numbers giving the velocity of the first channel and the channel width (km/s)
  • a cube with dimensions (NX, NY, NCHN+2), consisting of 4-byte floating point numbers. Each vector of NCHN+2 elements starts with the x- and y-offsets, followed by the channel values

Excitation temperature ^

Excitation temperatures are saved to binary files, named combining the prefix (as given in the ini-file), the name of the molecule and the transition, and with an ending of tex. The file

  • starts with four 4-byte integers, the dimensions of the model cloud (NX, NY, NZ), and the number of excitation levels included in the calculation (LEVELS)
  • ends with a (NX, NY, NZ) cube of excitation temperature values for the chosen transition, saved as 4-byte floating point numbers

Cooling rates ^

The cooling rates are saved to a plain binary file that consists of one 4-byte value per cell. The values are the cooling rates in cgs units.

Examples of run times for a 3D model ^

Below are some example of the run times on different computer systems. The model has a 121³ 3d cartesian grid. One is calculating CO spectra, taking into account the ten first energy levels and running the simulations with 128 velocity channels. The OpenCL platforms include Intel, NVidia, and AMD GPUs as well as CPUs that are used either with the open-source POCL or the Intel’s own OpenCL implementations. The run times are given in seconds, for a complete run that includes five iterations and the final writing of the spectrum files. Therefore, the serial Python host code and the IO stand for several seconds of the total run time.

  Run time    Device
  Dell XPS laptop
  853.87    CPU: Intel i7-8550U 1.80GHz, 4 cores, POCL implementation
  855.89    CPU: Intel i7-8550U 1.80GHz, 4 coresm Intel driver
  279.62    GPU: Intel UHD Graphics 620, Intel driver
  249.54    GPU: Intel UHD Graphics 620, Intel NEO driver
  46.39     GPU: AMD Radeon VII (external GPU enclosure)
  Laptop 2 (Tuxedo)
  351.70    CPU: Intel i7-8700K CPU @ 3.70GHz, 6 cores, POCL
  31.24     GPU: NVidia GeForce GTX 1080

  Desktop computer
  64.68     CPU: AMD Ryzen Threadripper 3960x, 24 cores, POCL
  30.19     GPU: NVidia GeForce GTX 1080 Ti
  14.89     GPU: NVidia GeForce RTX 3090

GPUs are generally more efficient that CPUs but also show a factor of ten differences between the cards. A modern multi-core CPU system does get close. However, the comparison does not include the latest GPU cards, which can be expected to be again a few times faster than the fastest GPUs in the above list.

Full example of a LOC run ^

In the following example we create a 3D model for a cloud, model the CO emission with LOC, and plot some spectra and a cross section of the excitation temperature cube. We use Python to make the model and to plot the results. The scripts can be found in the file LOC_Cartesian_test.zip in the LOC directory of our GitHub page .

Cloud model

We discretise the model onto a regular Cartesian grid of 64³ cells. Therefore, we can use the simpler format where the cloud file starts with the dimensions of the root grid (integers) followed by the four dimensional cube, with three spatial coordinates and the fourth index running over the parameters (density, kinetic temperature, three components of the velocity, and the molecular aundance. The following python script make_cloud.py writes the cloud file.

from numpy import *
from numpy.random import randn

# allocate the cube containing seven parameters for each cell
N       = 64
C       = ones((N, N, N, 7), float32)   # start with values 1.0 for all parameters

# calculate cell distances from the centre
k, j, i = indices((N,N,N), float32)
c       = 0.5*(N-1.0)                 # coordinate value at the centre
i, j, k = (i-c)/c, (j-c)/c, (k-c)/c   # coordinates in the [-1,1] range
r       = sqrt(i*i+j*j+k*k)           # distance from the centre

# Set the values in the cube
C[:,:,:,0]   = exp(-3.0**r*r)         # density values, 3D Gaussian
C[:,:,:,1]  *= 15.0                   # Tkin=15 K in every cell
C[:,:,:,2]  *=  1.0                   # leave microturbulence at 1 km/s
C[:,:,:,3]   = randn(N,N,N)           # macroscopic velocities as 
C[:,:,:,4]   = randn(N,N,N)           #   normal-distributed random numbers
C[:,:,:,5]   = randn(N,N,N)           #   ~N(0,1) for all three axes
C[:,:,:,6]   = 1.0                    # abundance values will be scaled later

# Write the actual file
fp = open('my.cloud', 'wb')
asarray([N, N, N], int32).tofile(fp)

Ini file and LOC run

We define the run-time parameters with a text file my.ini. As described above, this consists of keywords, followed by one or more parameters. The commented file looks like this:

cloud          my.cloud        #  cloud defined on a Cartesian grid
distance       100.0           #  cloud distance [pc]
angle          3.0             #  model cell size  [arcsec]
molecule       co.dat          #  name of the Lamda file
density        1.0e4           #  scaling of densities in the cloud file
temperature    1.0             #  scaling of Tkin values in the cloud file
fraction       1.0e-4          #  scaling o the fraction abundances
velocity       0.5             #  scaling of the velocity
sigma          0.5             #  scaling o the microturbulence
isotropic      2.73            #  Tbg for the background radiation field
levels         10              #  number of energy levels in the calculations
uppermost      3               #  uppermost level tracked for convergence
iterations     5               #  number of iterations
nside          2               #  NSIDE parameter determining the number of rays (angles)
direction      90 0.001        #  theta, phi [deg] for the direction towards the observer
points         64 64           #  number of pixels in the output maps
grid           3.0             #  map pixel size [arcsec]
spectra        1 0  2 1        #  spectra written for transitions == upper lower (pair of integers)
transitions    1 0  2 1        #  Tex saved for transitions (upper and lower levels of each transition)
bandwidth      6.0             #  bandwidth in the calculations [km/s]
channels       128             #  number of velocity channels
prefix         test            #  prefix for output files
load           test.save       #  load level populations
save           test.save       #  save level populations
stop          -1.0             #  stop if level-populations change below a threshold
gpu            1               #  gpu>0 => use the first available gpu

The keyword must be the first string on a line and its arguments are separated by at least by one space (or tab). One can add comments after the arguments or as separate lines starting with ‘#’.

Above the keywords density, temperature, fraction, velocity, and sigma all apply a scaling to the raw values read from the cloud file. We also explicitly choose the number of map pixels (points) and the map pixel size (grid) so that they correspond to the projected size and discretisation of the model cloud. The output files start with the prefix test. We do not specify the OpenCL device so LOC tries to use the first available CPU device. The molecular data are in the file co.dat, which can be downloaded directly from the Leiden Lamda database .

The actual radiative transfer calculations are done from the command line:

LOC_OT.py   my.ini

This should produce a few files, including the spectra in the files test_CO_01-00.spe and test_CO_02-01.spe and the excitation temperatures in test_CO_02-01.tex and test_CO_02-01.tex. The estimated level populations are saved to test.save and can be again loaded (keyword load) as the initial values if the calculations need to be continued.

Plotting spectra

Both the spectra and the excitatio temperatures are saved to binary files. The following Python routines can be used to read the files. These can be used also in the case of octree models (instead of the regular Cartesian grid of this example) - except that the reshape command in LOC_read_Tex_3D must be omitted (there the excitation temperatures are stored as a single, one-dimensional vector that covers all hierarchy levels).

import os, sys
from matplotlib.pylab import *

def LOC_read_spectra_3D(filename):
    Read spectra written by LOC.py (LOC_OT.py; 3D models)
        V, S = LOC_read_spectra_3D(filename)
        filename = name of the spectrum file
        V  = vector of velocity values, one per channel
        S  = spectra as a cube S[NRA, NDE, NCHN] for NRA
             times NDE points on the sky,
             NCHN is the number of velocity channels
    fp              =  open(filename, 'rb')
    NRA, NDE, NCHN  =  fromfile(fp, int32, 3)
    V0, DV          =  fromfile(fp, float32, 2)
    SPE             =  fromfile(fp, float32).reshape(NRA, NDE,2+NCHN)
    OFF             =  SPE[:,:,0:2].copy()
    SPE             =  SPE[:,:,2:]
    return V0+arange(NCHN)*DV, SPE
def LOC_read_Tex_3D(filename):
    Read excitation temperatures written by LOC.py (LOC_OT.py) -- in case of
    a Cartesian grid (no hierarchical discretisation).
        TEX = LOC_read_Tex_3D(filename)
        filename = name of the Tex file written by LOC1D.py
        TEX = Vector of Tex values [K], one per cell.
        In case of octree grids, the returned vector must be
        compared to hierarchy information (e.g. from the density file)
        to know the locations of the cells.
        See the routine OT_GetCoordinatesAllV()
    fp    =  open(filename, 'rb')
    NX, NY, NZ, dummy  =  fromfile(fp, int32, 4)
    TEX                =  fromfile(fp, float32).reshape(NZ, NY, NX)
    return TEX                                                                                

We will plot the J=1-0 and J=2-1 spectra towards the cloud centre, cross sections of the J=1-0 and J=2-1 excitation temperature distributions, and a map of the line area W for the J=1-0 transition.

V, T10 = LOC_read_spectra_3D('test_CO_01-00.spe')
V, T21 = LOC_read_spectra_3D('test_CO_02-01.spe')
Tex10  = LOC_read_Tex_3D('test_CO_01-00.tex')
Tex21  = LOC_read_Tex_3D('test_CO_02-01.tex')
NY, NX, NCHN = T10.shape  # number of pixels and channels

# One spectrum from (about) the model centre
plot(V, T10[NY//2, NX//2, :], 'b-', label='T(1-0)')
plot(V, T21[NY//2, NX//2, :], 'r-', label='T(2-1)')
legend(loc='lower center')
title("Centre spectra (K)")
xlabel(r'$v \/ \/ \rm (km s^{-1})$')
ylabel(r'$T \rm _A \/ \/ (K)$')

# Cross section of the Tex(1-0) cube
imshow(Tex10[:,:, Tex10.shape[2]//2])
title("J=1-0 excitation temperature (K)")

# Cross section of the Tex(2-1) cube
imshow(Tex21[:,:, Tex21.shape[2]//2])
title("J=2-1 excitation temperature (K)")

# J=1-0 line area
W     =  (V[1]-V[0])*sum(T10, axis=2)
title("Line Area W(1-0) (K km/s)")


The script produces the following figure. The grainy appearance of the Tex and W maps is not caused by numerical noise but by the random velocity field. This can be easily confirmed by recalculating the model without the random velocities (e.g. setting “velocity 0.0” in the ini file or recreating the cloud with some systematic velocity field). The above python programmes (cloud creation and plotting) can be found in LOC_Cartesian_test.zip in the LOC directory of our GitHub page .