Component separation

How to separate the emission from individual cosmological and astrophysical sources?

The component separation problem

The component separation problem can be stated as follows: given a number of observations of the sky, can we isolate the contribution, in the total observed emission, of all the different astrophysical processes contributing to the observed data set?

In the signal processing community, this type of problem, which arises in a variety of applications, is typically treated on the basis of statistical tools, which assume the total emission to arise from the linear superposition of a number of independent components, or 'sources'. Most of the time, nothing is known a priori about the sources, except the fact that their contributions to the total signal can be assumed to be statistically independent.

In astrophysical and cosmological applications, the problem is original in several ways. First, physical assumptions and prior knowledge usually exist about the components. This, in fact, contributes in defining the components, and should be used for optimal separation of the various contributions to the total observations. Second, the dynamical range of the observations can be very challenging (with important very small details, and sometimes orders of magnitude between the amplitude of different components). Last but not least, the scientific analysis of the components of interest requires precise characterisation of their reconstruction errors.

Component separation is recognised as one of the key challenges for several present and upcoming cosmological experiments, such as Planck, LOFAR, or the next generation CMB polarisation mission. Within the Planck collaboration, a special working group has been set to address the challenges of this particular problem, the 'Component Separation Working Group' or 'Working group 2' (WG2), which I have coordinated jointly with Gianfranco de Zotti (who represents the Planck Low Frequency Instrument).

Component separation probably is the ultimate challenge we have to face in astrophysics. While it is always possible to build more sensitive instruments, in the end astrophysical confusion will be the main source of uncertainty in our observations. This requires developing the data analysis methods needed to address the issue in the best possible way.

A review of the component separation problem and of some classical methods is available here. A comparison of the various methods developed within the Planck collaboration can be found here.

The spectral matching ICA

Spectral matching ICA (or SMICA) solves the problem of measuring parameters of modelled multi-component spectral covariances, using empirical covariances computed on multi-detector data sets. It is a very powerful method for blind spectral estimation, particularly suited to compute the likelihood of the power spectrum of a CMB component (which can itself depend on a set of cosmological parameters) from observations contaminated by foregrounds.

The original publication that introduces the method in the context of CMB observations is available here. An extension to more flexible models is discussed here and here. An application to the measurement of the tensor-to-scalar ratio of primordial fluctuations from CMB B-modes with a future space mission can be found in this paper.

Internal Linear Combinations

The "Internal linear combination" (ILC) component separation method is a way to extract a component of known emission law (i.e. amplitude as a function of electromagnetic frequency) from multifrequency observations, without use of prior information or of external data sets (hence the name). It consists in forming the linear combination of the observations that preserves the signal of interest while minimising the variance of the output.

The main advantage of the method is that it assumes very little about the data, and in particular assumes nothing about unknown or poorly known astrophysical components, nor about the exact properties of the instrumental noise. It has hence been widely used for CMB observations.

It is possible to implement independent ILCs in various regions of the sky, in various regions of the harmonic domain. I have proposed and have particular interest in the needlet-ILC method, for which different linear combinations are implemented in domains of a decomposition of the data sets on a tight frame of spherical wavelets. The original paper that introduces the method and applies it to WMAP 5-year temperature data, A full sky, low foreground, high resolution CMB map from WMAP, is accessible here. Further work on WMAP 7-year temperature and polarisation data is available here and here.

The ILC does not use much prior information about the data set. The only requirements are that the component of interest must have a known emission law, and that it must not be correlated to the contaminating foregrounds. However, these assumptions are critical for the performance of the method. For instance, errors in the emission law can have dramatic consequences on the quality of the recovered component of interest. The issue is discussed in detail in this paper.

While the ILC has been devised originally to recover one single component of interest with a fixed emission law (e.g. the CMB), it is possible to extend the method to recover more than one component, with vanishing contamination from the other ones. This is discussed in this paper. An extension to multidimensional or correlated components with unknown emission laws is available here.