Wednesday, August 15th
Room 3167 Graves Hall
“Binary Signals in Oncology”
A fundamental goal when applying high-throughput technologies (starting with gene expression microarrays, but more recently expanding to microRNA arrays, protein arrays, etc.) is to identify biomarkers (individually or in sets) that can distinguish clinically relevant outcomes. This goal is often accomplished either using gene-by-gene t-tests or by clustering to find subtypes. However, a preliminary step involves arbitrarily filtering the genes based on the mean or standard deviation of expression across the samples. It is not clear why or how this filtering step should bring us closer to the "true" structure on the data. In this talk, we propose instead to focus on genes whose expression distribution is bimodal. Such genes provide individual evidence for a particular binary split in the data; if many genes provide evidence for the same split, it is more likely to represent "true" structure. Along the way, we introduce a measure of the strength of bimodality for individual genes that we call the
"bimodality index". We also introduce an algorithm to identify sets of bimodal genes that represent the "same" binary signal. We will illustrate these methods by applying them to both simulated and real datasets.
Kevin Coombes, PhD
Kevin Coombes is a Professor in the Department of Bioinformatics and Computational Biology at the UT M.D. Anderson Cancer Center. He received his Ph.D. in pure mathematics in 1982 from the University of Chicago and. worked for many years in the areas of algebraic K-theory and arithmetic algebraic geometry (while rising through the academic ranks at MIT, University of Michigan, and University of Maryland in College Park). In the mid 1990’s, he shifted his research interests to bioinformatics. He received awards for “Best Presentation” at the 2001 and 2002 CAMDA (Critical Assessment of Microarray Data Analysis) conferences, and for “Best Abstract” at the First Annual Proteomics Data Mining Conference. His current research focuses on statistical, mathematical, and computational methods to process, analyze, and understand highly multivariate biological data arising from high throughput technologies. He is particularly interested in (1) methods that incorporate existing biological knowledge early in the analytical process and (2) methods that integrate diverse types of biological data with a view toward predicting clinically relevant patient outcomes. With his collaborator, Keith Baggerly, he is known for his work in “forensic bioinformatics”, which is an effort to understand and uncover the (often poorly described) methods that were actually used to analyze large data sets.