Dowsey 2017

Statistical Modelling 17 (4-5) (2017), 290–299

The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry

Andrew W Dowsey
School of Social & Community Medicine and School of Veterinary Sciences,
Faculty of Health Sciences,
University of Bristol,
United Kingdom
e-mail: andrew.dowsey@bristol.ac.uk

Abstract:

In their article, Morris and Baladandayuthapani clearly evidence the influence of statisticians in recent methodological advances throughout the bioinformatics pipeline and advocate for the expansion of this role. The latest acquisition platforms, such as next generation sequencing (genomics/transcriptomics) and hyphenated mass spectrometry (proteomics/metabolomics), output raw datasets in the order of gigabytes; it is not unusual to acquire a terabyte or more of data per study. The increasing computational burden this brings is a further impediment against the use of statistically rigorous methodology in the pre-processing stages of the bioinformatics pipeline. In this discussion I describe the mass spectrometry pipeline and use it as an example to show that beneath this challenge lies a two-fold opportunity: (a) Biological complexity and dynamic range is still well beyond what is captured by current processing methodology; hence, potential biomarkers and mechanistic insights are consistently missed; (b) Statistical science could play a larger role in optimizing the acquisition process itself. Data rates will continue to increase as routine clinical omics analysis moves to large-scale facilities with systematic, standardized protocols. Key inferential gains will be achieved by borrowing strength across the sum total of all analyzed studies, a task best underpinned by appropriate statistical modelling.

Keywords:

computational statistics; sparse signal processing; mass spectrometry; proteomics; metabolomics.

back