MAD is calculated as the median of the absolute differences between data values and their median. For normal distribution MAD is equal to 1.48 times the standard deviation but is a more robust estimate of the dispersion of data values. The calculation of MAD is in principle straightforward but can be time-consuming, especially when used in the analysis of large images where MAD estimate is needed for some local environment around each pixel. One such application was the Planck Galactic Cold Clumps Catalogue (PGCC, Planck Collaboration 2015) where the source detection pipeline used MAD values to estimate the reliability of the source detections (Montier et al. 2010). The all-sky Planck maps have of the order of 50 million pixels and thus MAD had to be calculated 50 million times (per channel). In numerical simulations the number of pixels of a single synthetic map can easily be larger than this. If MAD is to be used, it should be as fast as possible.

In Python, MAD could be calculated using the generic_filter routine and a function like this:

def fun(x):
return np.median(np.abs(x-np.median(x)))

This is concise. However, each call to median function probably implies that the input vector is first sorted, which does take some time. For example, for an image of 4096×4096 pixels, it took 16 minutes to calculate MAD estimates for all pixels. In this case each MAD value was calculated using data within a radius of 30 pixels of the central pixel. Unfortunately, I now have some 50 images of 8196×8196 pixels each so that, based on the above example, it will take more than two days to compute the corresponding MAD images. Every time something changes in the analysis, there is this two-day overhead.

Things can be improved by (1) improving the algorithm that computes MAD, by (2) replacing the Python routine with compiled code, (3) parallelising its execution, and (4) possibly running it on GPU instead of a CPU. After doing all this, the run time of the previous example could be reduced from 946 seconds (16 minutes) down to 2.3 seconds. This is a speed-up of just over a factor of 400!

Part of the speed-up comes from an algorithm that does not sort the whole input array. Actually, the impact of this should be larger when MAD is computed for longer data vectors, instead of the 30-pixel radius areas (~100 pixels) of our example. The idea is that one first calculates the mean and standard deviation (one pass of the data), based on these sets up a number of bins (sub-intervals) and counts the number of data falling in each of the them (second pass). At this point one knows in which bin the median value resides. If the number of data points in that bin is still large, one can keep on subdividing the interval until the number of data points in the median-containing bin has fallen below some threshold. Finally, the data entries of that single bin are sorted and the median value is returned. In our test case the total number of data points is similar to the selected number of sub-intervals. This means that once the data are for the first time divided into bins, the bin that contains the median has only very few other data points. Therefore, one can directly proceed with the sorting, after having only twice passed through the full input data vector. The idea is similar to that of the pigeonhole sort, with the difference that to get the median value we do not need to sort the whole input vector, only the data near the actual median (data in one bin).

The method was implemented with pyopencl. Tus, we can calculate MAD using a call to a normal python function. The function transfers the data to the actual computing device and takes care of the compilation of the kernel code. We include the main part of the code below.

def MADFilterCL(X, R1, R2, circle=0, GPU=0):
"""
Median absolute deviation filter on a full 2D image.
Input:
X = 2D numpy array
R1, R2 = inner and outer radius of the annulus [int32], if circle=True,
otherwise only R2 is used and the footprint is 1+2R2 pixels squared
circle = if >0, footprint is circular instead of a full square
GPU = if >0, use GPU instead of CPU
"""
# InitCL() not included here = standard OpenCL initialisations
platform, device, context, queue, mf = InitCL(GPU)
LOCAL = [ 8, 64 ][GPU>0] # set work group size
N, M = X.shape # dimensions of the input image
NFOOT = int((1+2R2)2.0) # maximum number of pixels under the footprint
OPT = " -D N=%d -D M=%d -D R1=%d -D R2=%d -D CIRCLE=%d -D NFOOT=%d -D LOCAL=%d" % (N, M, R1, R2, circle, NFOOT, LOCAL)
program = cl.Program(context, source).build(OPT)
S = np.asarray(X, float32)
SS = np.empty_like(S)
S_buf = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=S)