An Investigation into the Reproducibility of Computational Tools used in Genomics Research

Liu, Sophia Shou

doi:https://doi.org/10.21985/n2-ckt0-2a40

Work

An Investigation into the Reproducibility of Computational Tools used in Genomics Research

Public

Download PDF

Recent studies have shown that many high profile studies in biology could not be reproduced, calling into questions the legitimacy of their findings. These reports have the potential to greatly jeopardize the credibility of scientists in the field as well as erode public confidence in the scientific enterprise. Computational research has the perception of being more reproducible than experimental studies because researchers have complete control in computational studies. In this thesis, I investigate how poor understanding of the computational tools used to study genomes has lead to poor reproducibility in studies using those tools. I show that (i) improperly defined null models used in algorithms used to find sequence motifs leads to the identification of spurious patterns, (ii) the lack of standardization and benchmarking associated with algorithms used to measure codon bias can result in false positive and false negative results in the literature, and (iii) poor documentation associated with RNA-seq data processing pipelines may make it impossible to reproduce most RNA-seq studies. My work raises a clarion call for principled and systematic approaches for the evaluation of computational tools used to study genomes as they become increasingly essential to biological studies.

Creator