Biomedical Informatics Courses
Although it is not required, a background or experience with computer science, statistics, anatomy, physiology, and/or medical terminology is strongly recommended.
- Core concepts to be reviewed during this course include:
- Basic computational skills (R programming)
- Data integration (data transformation / merging / manipulation, metadata integration)
Basic probability (conditional probability, Bayes theorem, probability distributions, sampling distributions)
- Study design principles (population and sample selection, study design principles)
- Exploratory analysis of data (graphical displays of data, data summarization)
- Statistical analysis of data (estimation, confidence intervals, hypothesis testing, regression, two-group tests, analysis of variance (ANOVA), survival analysis)
- Power and sample size calculations
- In silico hypothesis generation (data mining, text mining, and visualization)
- Introduction to data and methods in bioinformatics (clustering, classification, RNA-seq data analysis)
The purpose of this course is for faculty and external guest speakers to give presentations on current BMI research and theories critical to the advancement and awareness of biomedical informatics within the healthcare and research communities. Alternate classes will consist of journal-club style discussions moderated by faculty, in which trainees will present on their current research projects.
Trainees will choose a select number of seminars to attend as determined by the course director. Seminars can be chosen from either the departmental CALIBRE seminar series (Fridays, 11:00 AM - 12:00 PM) or from the seminar series, which brings in external speakers to present on their research (schedule is listed on departmental calendar).
The goal of this course is to introduce trainees to the fundamental algorithms needed to understand and analyze genome-scale expression data sets. The course will cover three major kinds of applications. (1) Class comparison seeks to describe which features differ between two or more known classes of patient samples (such as normal vs. tumor). The methodology includes (generalized) linear models with careful attention to the issue of multiple comparisons. (2) Class discovery seeks to discuss the inherent structure present in a data set. The methodology includes a wide variety of techniques for clustering samples (including K-means as well as various forms of hierarchical clustering) and assessing the number of clusters and the robustness of cluster assignments. We also cover methods such as principal components analysis that help visualize the data. (3) Class prediction seeks to discover and validate models that can accurately predict the class or the outcomes of new samples. Methods include a wide variety of machine learning and statistical methods for feature selection and model construction. We will also discuss methods for cross-validation and independent validation of predictive models. The course will include an introduction to, and hands-on experience with, the R statistical software environment and the use of R packages that can be applied to these kinds of problems.
It is expected that students have basic knowledge of the following areas:
- Computer science principles (logic, procedural and/or object oriented programming, data structures and algorithms).
- Statistical methods.
- Biomedical terminology.
This course will cover theory and concepts of empirical tools to describe disease burden, health state utilities, patient-reported outcomes, patient experience, and patient preferences. The course will also cover important policy contexts that govern the study of patient preferences (e.g. regulatory, pricing/reimbursement) and good research practice documents that guide empirical studies of the patient experience.
This course has the two goals of teaching students in all aspects of life sciences how to computationally analyze datasets and to instill best lab practices in experimental design and analysis. Students will learn the computer language R and use R to analyze datasets from transcriptome, genome, and clinical studies. Students will develop an understanding of sources of bias and the impact of these biases on results and potential conclusions. Examples will be taken from the literature of experimental designs that were rigorous and had built-in flaws. At the completion of the course, students will have an intermediate level of competency in R and knowledge of how to manage and analyze large datasets.
Independent Studies and Research Credits
Prereq: Permission of instructor. Repeatable to a maximum of 60 credit hours or 4 completions. This course is graded S/U.
Pre-req: Permission of instructor. Repeatable to a maximum of 99 credit hours or 20 completions. This course is graded S/U.
Special Topics
Course Director: Lang Li, PhD and Xia Ning, PhD
Class Time and Location: Tuesdays and Thursdays, 3:55 – 5:15 PM; Virtual classes
The goal of this course is to introduce trainees to the natural language processing and text mining on biomedical data, including clinical notes from electronic medical records and biomedical literature. Examples of major topics to be discussed in this class include: 1) gold standard annotations on the text, i.e. corpus construction; 2) NLP tasks, including text classification, part of speech tagging, chunking, syntactic annotation, semantic annotation, named entity recognition, and relationship extraction; 3) text data processing, feature generations and representation learning; 4) machine learning and deep learning methods, 5) the applications of machine learning and deep learning methods in NLP; and 6) software.
While there are no strict requirements, a successful student should have basic knowledge of the following areas: 1. Programming skills in R. 2. Some knowledge in machine learning.
Class time and location: Mondays, 2:15-5:00 pm; Virtual classes
Course description: Artificial Intelligence (AI) and Machine Learning (ML) provides an unprecedented opportunity to accelerate and revolutionize human health and the pace of clinical and translational science. The purpose of this course is to train the next generation of the translational medical workforce by teaching them the primary ML and AI algorithms used in bioinformatics and computational biology. We will cover the theoretical underpinnings of the methodology along with an explanation of how to use practical implementations (in R or python) of how to apply the methods to real bioinformatics data sets. An important goal of the course is to introduce students to more advanced algorithms that are not covered in other classes in BMI. Examples include modern regression techniques (including ridge regression, lasso, and elastic nets), deep learning (CNN, RNN, GNN using TensorFlow), non-linear dimension reduction (including t-SNE and ISOMAP), directed and undirected graphical models, and association rules. It is noteworthy that the class will have a special emphasis on the fundamentals and applications of deep learning and provide a conceptual understanding of deep learning with a holistic view and latest developments in the field. By the end of the course, students will have had practice applying all of these methods to actual data sets.