Paper ID sheet UCL-INMA-2017.09


Comparison of location-scale and matrix factorization batch effect removal methods on gene expression datasets

Emilie Renard, P.-A. Absil
Merging gene expression datasets is a simple way to increase the number of samples in an analysis. However experimental and data processing conditions, which are proper to each dataset or batch, generally influence the expression values and can hide the biological effect of interest. It is then important to normalize the bigger merged dataset, as failing to adjust for those batch effects may adversely impact statistical inference. Batch effect removal methods are generally based on a location-scale approach, however less widespread methods based on matrix factorization have also been proposed. We investigate on breast cancer data how those batch effect removal methods improve (or possibly degrade) the performance of simple classifiers. Our results indicate that the matrix factorization approach would deserve greater attention, as it gives results at least as good as common location-scale methods, and even significantly better results in specific cases.
Key words
Paper presented at the First Annual Workshop on Reproducibility and Robustness in Biological Data Analysis and Integration (RRoBIn 2017) in conjunction with the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)