Course Catalog

Full-Day Courses

Clinical evidence generation using electronic health records data

Instructors: Yong Chen, University of Pennsylvania; Shi Xu, University of Michigan

Target audience: researchers in academia and industry who work on real world data; students in biostatistics

Prerequisites for participants: linear regression

Computer and software requirements: R

The widespread adoption of electronic health records (EHR) as a means of documenting medical care has created a vast resource for the study of health conditions, interventions, and outcomes in the general population. Using EHR data for research facilitates the efficient creation of large research databases, execution of pragmatic clinical trials, and study of rare diseases. Despite these advantages, there are many challenges for research conducted using EHR data. To make valid inference, statisticians must be aware of data generation, capture, integration, and availability issues and utilize appropriate study designs and statistical analysis methods to account for these issues.

This short course will introduce participants to the basic structure of EHR data and analytic approaches to working with these data through a combination of lectures and hands-on exercises in R. The first part of the course will cover issues related to the structure and quality of EHR data, including different data types, opportunities and challenges, and methods for extracting variables of interest. In the second part of the course, we will discuss statistical methods to mitigate data quality issues arising in EHR, including confounding, error in EHR-derived covariates and outcomes, and data integration across multiple clinical practices. Participants will explore synthetic EHR-derived data sets to gain familiarity with the structure of EHR data and statistical tools for analyzing EHR data.

Categories: methodology; career development

Deep Learning applications in statistical problems

Instructors: Hongtu Zhu, University of North Carolina; Xiao Wang, Purdue University; Runpeng Dai, University of North Carolina

Target audience: researchers with basic knowledge of machine learning and neural networks

Prerequisites for participants: familiarity with Python and PyTorch; basic knowledge of machine learning; proficiency in statistics

Computer and software requirements: access to Google Colab

This short course delves into the intersection of Deep Learning and statistical analysis. Participants will explore and apply Deep Learning methodologies to tackle various statistical problems. The course covers advanced topics such as longitudinal data analysis, survival analysis, quantile regression, autoencoders, generative models, and handling spatial-temporal data using Deep Learning techniques.

Category: methodology

Statistical methods for time-to-event data from multiple sources: A causal inference perspective

Instructors: Xiaofei Wang, Duke University; Shu Yang, North Carolina State University

Target audience: statisticians with interest in applying causal inference and new integrative methods to analyze survival data from randomized clinical trials and observational studies

Prerequisites for participants: knowledge of casual inference and survival analysis is preferred but not required 

Computer and software requirements: a laptop computer with the latest version of R is strongly recommended

The short course will review important statistical methods for survival data arising from multiple data sources, including randomized clinical trials and observational studies. The entire short course consists of four parts and all parts will be discussed in a unified causal inference framework. In each part, we will review the theoretical background. Supplemented with data examples, the application of these methods in practice and implementation of these methods in freely available statistical software will be emphasized. Interactive sessions on implementing the new methods in R will be held. The methodology work related to the short course for both instructors has been funded by NIH R01 and FDA U01 grants.

Category: methodology

 

Structural equation modeling and its applications using R and SAS

Instructors: Din Chen, Arizona State University; Yiu-Fai Yung, SAS Institute

Target audience: graduate students, teaching faculty, and anyone interested in learning structural equation modeling. This course is designed for statisticians and data analysts who would like to learn SEM techniques for their own research and applications. 

Prerequisites for participants: basic understanding of regression analysis. Experience using SAS/R software would be helpful but is not required since we will cover the basics first.

Computer and software requirements: none

Originating from the social sciences, structural equation modeling (SEM) is becoming more popular in fields such as education, health science, and medical sciences. This one-day short course aims to provide an overview of SEM and to demonstrate its applications by using R and SAS software based on our newly published book, Structural Equation Modeling Using R/SAS: A Step-by-Step Approach with Real Data Analysis (Chapman and Hall/CRC, 2023). We will cover some main SEM topics, including path analysis, confirmatory factor analysis, mediation analysis in longitudinal settings, structural relations with latent variables, multiple-group SEM, latent growth-curve modeling, and model modification. Real-world applications are compiled to demonstrate its applications in social, educational, behavioral, and marketing research. Mathematical and statistical foundations of SEM are discussed at a level suitable for general understanding. Both the R package “lavaan” (latent variable analysis) and the CALIS procedure of SAS/STAT will be used to demonstrate model specifications, fitting, and result interpretations. 

Category: methodology

Half-Day Courses: Morning

An introduction to sample size calculation in experimental design

**CANCELED** – consider registering for “Introduction of biomarker discovery” instead

Functional data analysis and its applications

Instructor: Pang Du, Virginia Tech

Target audience: practitioners or researchers with an interest in understanding and using functional data analysis

Prerequisites for participants: a general knowledge of linear regression and multivariate statistics

Computer and software requirements: a computer/laptop installed with R (and RStudio) is recommended

This course aims to introduce the modern field of functional data analysis to a general audience with the emphasis on how the relevant techniques can be applied to real examples. As a generalization of the traditional data concepts from numbers and vectors of numbers to curves and surfaces, functional data have attracted a lot of attention from statisticians and found many interesting applications in a variety of fields in the past decades. The course will start with the introduction of real examples for functional data. Based on these examples, common functional data analysis techniques such as function smoothing, functional principal component analysis, and functional linear regression models will be presented. R implementation of these techniques will be introduced and demonstrated.

Category: methodology

Inference on treatment effects in clinical trials with terminal and non-terminal events in the presence of competing risks

Instructor: Song Yang, NHLBI (National Heart, Lung, and Blood Institute)

Target audience: students, researchers and clinical trial practitioners

Prerequisites for participants: a general knowledge of linear regression and multivariate statistics

Computer and software requirements: none; R and Octave/MATLAB optional

Clinical trials often involve a terminal event (e.g., cardiovascular death) and some non-terminal events (e.g., stroke) where the terminal event may censor the non-terminal events and may also be subject to competing risks. The traditonal first event analysis does not use data fully and opaquely mixes different events, leading to wide CI and power loss. Various methods have been proposed in recent years, some involving complex models and others having intuitive appeal but hidden conditions. Furthermore, some are well-suited for etiological studies, while others are more convenient for testing and summarizing treatment effects. It is challenging to navigate the landscape in search of improved efficiency without biased, difficult to interpret, and non-generalizable results.

This user-friendly course addresses these challenges by discussing strengths and limitations of various approaches such as Copula models, multi-state models, restricted mean time, win ratio and their respective variants. Recommendations are given on which methods to use for a possible treatment effect scenario, emphasizing practical and robust analyses. Suggestions are made that facilitate the development of design and analysis plan for a future trial. Choices between hazard-oriented methods and cumulative incidence function-based methods and their alignment with clinical questions of interest are discussed. The methods are illustrated with Octave/MATLAB on data from a few recent large trials.

Category: methodology

Inspection sampling: Concepts and applications

Instructors: Christopher Breen, Eli Lilly; Niamh Ducey, Eli Lilly

Target audience: industry/manufacturing

No prerequisites or computer/software requirements

This introductory training course on the concepts and applications of inspection sampling is designed to provide data scientists, statisticians, and CQEs (certified quality engineers) with an understanding of sampling methodologies crucial for effective control of product quality.

This course will begin by familiarizing participants with the relevant terminology associated with inspection sampling, including acceptance quality limit (AQL), limiting quality level (LQL), attributes sampling for the inspection of qualitative characteristics, and operating characteristic (OC) curves.

This course continues by exploring the statistical and probability concepts involved with attributes sampling to empower participants with a quantitative approach in assessing the quality of a product or process via inspection sampling.

Through case studies, practical exercises and examples, participants will learn how to select and interpret the appropriate type of inspection sampling plan ensuring that participants can directly apply the concepts to their respective fields.

Category: methodology

Unlocking the power of semiparametric models: A practical tutorial for analyzing complex data with minimum assumptions

Instructors: Xin Tu, University of California, San Diego; Tuo Lin, University of Florida; Jinyuan Liu, Vanderbilt University Medical Center

Target audience: All levels of (bio)statisticians and data scientists are welcome. We will cover both the fundamentals and the more advanced topics of semiparametric models accompanied by diverse real-world applications.

Prerequisites for participants: knowledge of statistical inference; basic understanding of large sample theory

Computer and software requirements: basic knowledge of R programming

This half-day short course will give biostatisticians and data scientists an engaging overview of semiparametric modeling via real-world applications with complex structures, such as high-throughput sequencing and network data. Both classical and cutting-edge semiparametric techniques will be explored, highlighting their roles in balancing robustness, flexibility, and efficiency with minimum assumptions.

The foundation of statistical inference relies on models with explicit or implicit assumptions about the underlying data-generating process. Often, these models are parametric, characterized by finite-dimensional parameters. They have only limited robustness in practice, which championed the advancement of semiparametric modeling that blends finite-dimensional parameters of interest with infinite-dimensional nuisance parameters. Such flexibility has led to emerging applications in many research disciplines, evidently focusing on causal inference, missing data, survival, and survey studies.

The first half of this short course introduces the fundamental concepts of semiparametric models and outlines their roles in robust inference without and with missing data. Some recent advances will be discussed in the second half, covering diverse applications that scale up to the high-dimensional microbiome data and HIV viral genetic linkage networks while also scaling down to inferences encountering outliers and small sample sizes.

Categories: data-driven methodology and application

 

Half-Day Courses: Afternoon

Everyday reproducibility: Simple flexible tools for making analyses more accessible and reproducible

Instructors: Gregory Hunt, William & Mary; Johann Gagnon-Bartsch, University of Michigan

Target audience: practitioners at all levels

Prerequisites for participants: some familiarity with a high-level statistical programming language like R or Python

Computer and software requirements: ideally a computer with either RStudio or Jupyter; if desired, an installation of Docker

Ensuring that analyses are reproducible is important for statisticians broadly from academics to industry. Indeed, the ability to reproduce third-party results is fundamental to the scientific process itself as well as to the public confidence in this process. In addition to ensuring that analyses are reproducible, it is important that these analyses can be easily shared, accessed, and explored. While critically important, building analyses that are computationally reproducible, shareable, and accessible is not a trivial task. This “reproducibility crisis” been recognized in popular science and professional statistical societies alike.

Indeed, the desire to make analyses reproducible, shareable, and accessible, has led to significant development of computational tools and discussion of best practices within the statistical community. We offer a half-day course that exposes participants both to a conceptual discussion of reproducibility and its role in statistics and data analysis, as well as providing a concrete survey of pragmatic tools and practices that can be straight-forwardly adopted by practitioners to help enhance the reproducibility, share-ablility, and accessibility of analyses they create. Our goal is to cover practical tools and paradigms that can be widely adopted by statisticians across an array of fields, computing environments, and reproducibility goals.

Categories: reproducibility; software tools

Fitting semiparametric transformation models with ordinal regression

Instructors: Bryan Shepherd, Vanderbilt University Medical Center; Chun Li, University of Southern California

Target audience: statisticians and biostatisticians

Prerequisites for participants: Understanding of linear and logistic regression. Understanding of ordinal regression (e.g., proportional odds models) would be ideal. R programming experience.

Computer and software requirement: laptop with R installed

Continuous response data often require transformation prior to analysis. The proper transformation is typically unknown and results are often sensitive to the choice of transformation. Semiparametric transformation models assume that after an unspecified transformation, which will be nonparametrically estimated, the response variable follows a linear model. These models are robust approaches for fitting skewed or ill-behaved continuous response data, but they have historically been difficult to fit. We show that semiparametric transformation models can be fit using ordinal cumulative probability models (e.g., ordered logistic regression) in a computationally simple manner. Students will learn how to fit these models using the rms package in R, and how to extract interpretable quantities including conditional means, quantiles, and relevant confidence intervals. We will also demonstrate how these models can be easily fit to clustered data and data with detection limits.

Category: methodology

Introduction of biomarker discovery in cancer research

Instructors: Xiaoli Zhang, The Ohio State University; Lianbo Yu, The Ohio State University

Target audience: graduate students and biostatisticians who are interested in precision medicine

Prerequisites for participants: basic knowledge of logistic regression, survival analysis, and SAS or R programming

Computer and software requirements: personal laptop with SAS and/or R program

With the significant advancements in genomic profiling technologies and the emergence of selective molecular targeted therapies, biomarkers have played an increasingly pivotal role in both the prognosis and treatment of various diseases, most notably cancer. This workshop is designed to begin with an introductory overview of basic concepts of biomarkers, the diverse categories of biomarkers, commonly employed biotechnologies for biomarker detection, with a special focus on gene mutations and gene expression. Furthermore, we will discuss processes of biomarker discovery and development, and outlining the key steps involved and the current analytical methodologies utilized. Following this, we will discuss the identification of driver gene mutations and altered gene expression, using lung cancer data in The Cancer Genome Atlas (TCGA) as an illustrative example with the use of SAS or R code in practical demonstrations to enhance understanding. In the latter part of this workshop, we will discuss commonly utilized biostatistics and bioinformatics tools, including data visualization, survival analysis and classification methods, which are employed to predict disease progression and patient survival outcomes based on these critical biomarkers. If time permits, we will also discuss the concept and analysis of scRNA-seq data. By the conclusion of this course, participants will have acquired a broad and fundamental understanding of biomarker discovery in cancer research.

Category: career development

Introduction to neuroimage analysis for biostatisticians

Instructors: Catie Chang, Vanderbilt University; Sarah Goodale, Vanderbilt University; Simon Vandekar, Vanderbilt University Medical Center

Target audience: statisticians and biostatisticians of any level interested in learning to work with MRI derived neuroimaging data

Prerequisites for participants: experience with R, Python, or MATLAB

Computer and software requirements: laptop with minimum 8GB of RAM. R, Python, or MATLAB.

Beginning to work with neuroimaging data can be overwhelming for many biostatisticians whose methodological background could provide important tools and insights into modern challenges of neuroimage analysis. This course will provide a background of the data types, neuroimaging I/O and analysis resources, some preprocessed open source datasets, and recent concerns and challenges in the neuroimaging community. The goal of the course is to provide biostatisticians with the resources to begin developing and implementing statistical methods that may be useful to the neuroimaging community. The course will be primarily didactic with a follow-along tutorial on loading and working with imaging data in R/Python/MATLAB at the end.

Category: technology training

Model-assisted designs: Make adaptive clinical trials easy and accessible

Instructors: Ying Yuan, MD Anderson Cancer Center; Jack Lee, MD Anderson Cancer Center

Target audience: researchers, pharmaceutical statisticians, and regulatory agencies

Prerequisites for participants: completion of first-year graduate courses in statistics/biostatistics

Computer and software requirements: none. The instructors will demonstrate the use of free online software.

Drug development and clinical research face the challenges of prohibitively high costs, high failure rates, long trial duration, and slow accrual. One important approach to addressing this pressing issue is to use novel adaptive designs, which unfortunately can be hampered by the requirement of complicated statistical modeling, demanding computation, and expensive infrastructure for implementation.

This short course is designed to provide an overview of model-assisted designs, a new class of designs developed to simplify the implementation of adaptive designs in practice. Model-assisted designs are derived based on rigorous statistical theory, and thus possess superior operating characteristics and great flexibility, while can be implemented as simply as algorithm-based designs. Easy-to-use Shiny applications on the web and downloadable standalone programs will be introduced to facilitate the study design and conduct. The main application areas include adaptive dose-finding, adaptive toxicity and efficacy evaluation, posterior probability and predictive probability for interim monitoring of study endpoints, outcome-adaptive randomization, hierarchical models, multi-arm, multi-stage designs, and platform designs.  Lessons learned from real trial examples and practical considerations for conducting adaptive designs will be given.

Categories: methodology and application

All courses will be held on Sunday, June 16, 2024.

  • Full-day courses: 8:00 a.m. – 5:00 p.m., with a lunch break from noon to 1:00 p.m.
  • Morning half-day courses: 8:00 a.m. – noon, with a 15-minute break
  • Afternoon half-day courses: 1:00 – 5:00 p.m., with a 15-minute break

The symposium organizers reserve the right to cancel courses with low enrollment. 

View the rest of the symposium schedule at the Program page

Scroll to top