NEWS & EVENTS

NHLBI Biostatistics Workshop on Recent Advances and Challenges in Statistical Methods

September 26 - 27 , 2016
Auditorium Balcony A&B, Building 45
Natcher Conference Center, NIH Campus, Bethesda, MD 2089

Description

This workshop has two main objectives: (a) to assess recent developments in statistical methods relevant to NHLBI studies; (b) to identify the major challenges and important issues related to these statistical and analytical methods. Given the rapid development of new technology and the growing need to analyze massive and complex (“big”)data, this workshop will especially focus on novel statistical models, computational issues for large data sets, and efficient and effective study designs. This workshop will bring together leading experts in biostatistics and big data, clinical trials, statistical genetics, statistical computing, and database specialists to present the recently developed statistical and analytical methods and to discuss the applications of these methods to NHLBI studies. The group will also identify the gaps in knowledge and recommend future methodological research directions which will meet the specific needs of future NHLBI studies. This workshop will be an excellent opportunity for statisticians, researchers and investigators who encountered these statistical and analytical issues to collaborate and develop novel methodological tools which can be applied to future NHLBI studies.

Agenda

7:30am-8:05am
-
Registration

8:05am-8:10am
-
Welcome from OBR Director: Nancy L. Geller

8:10am-9:55am
-
Session 1: Practical Issues of Large Clinical Studies

Chair: Nancy L. Geller, NHLBI/NIH

8:10am-8:40am, Ying Lu, Stanford University

Learning VA Healthcare System Through Large Cooperative Studies

Ying Lu, Ph.D.1, Grant Huang, Ph.D.2, Mei-Chiung Shih, Ph.D.1, Ryan Ferguson, Ph.D.3

1. VA Palo Alto Health Care System and Stanford University, Stanford, CA, USA

 2. Office of Research and Development, Department of Veterans Affairs, Washington, DC, USA

3. VA Boston Health Care System and Boston University, Boston, CA, USA

The VA Cooperative Studies Program (CSP) is a division in the Office of Research and Development in the US Department of Veterans Affairs (VA). CSP has 44-years history of planning and conducting large multicenter clinical trials and epidemiological studies initiated by VA investigators within the VA Healthcare System. The mission of CSP is to advance the health and care of Veterans through cooperative research studies that produce innovative and effective solutions to Veteran and national healthcare problems. In this talk, we provide an overview of CSP and examples of our innovations to integrate large clinical trials/observational studies with the largest national healthcare system: including the CSP Network of Dedicated Enrollment Sites (NODES), Point-of-Care (PoC) randomization trials, and VA Million Veterans Program. We will also discuss statistical challenges in learning healthcare system through PoC trials.

8:40am-9:10am, Song Yang, NHLBI/NIH

Improving the testing and description of treatment effect in clinical trials with time-to-event outcomes

Song Yang
NHLBI/NIH

To test for existence of a treatment effect in clinical trials with survival data, the log-rank test has been the most widely used tool. Under the proportional hazards assumption, the log-rank test is optimal, and the hazard ratio estimates from a proportional assumption can provide simple and useful summary measures, even if the hazard ratio is moderately time dependent. 

In many applications, the combination of log-rank test and hazard ratio estimation works well for testing the treatment effect and for reporting a summary measure of the treatment effect. But such an approach can break down when there is substantial variation of the hazard ratio over time, as exemplified by the WHI estrogen plus-progestin trial. For the testing and reporting of treatment effect in trials with time-to-event data, can the ideas of the hazard ratio estimation and log-rank type procedure be remedied to meet the challenges when the proportional hazards assumption may or may not hold? We discuss some recent developments and illustrate them in several trials including the WHI estrogen plus-progestin trial.

9:10am-9:40am, Robert M. Califf, FDA

TBA

9:40am-9:55am, Q&A, Floor Discussion

9:55am-10:10am
-
Break

10:10am-12:05am
-
Session 2: Innovative Study Designs & Precision Medicine

Chair: Song Yang, NHLBI/NIH

10:10am-10:35am, Tze-Liang Lai, Stanford University

Adaptive Clinical Trial Designs for Patient Subgroup Selection

Tze-Liang Lai

Depts. of Statistics, Biomed. Data Science, Health Res. Policy, ICME;
Stanford Cancer Institute, Center for Innovative Study Design,
Population Health Science Center, Stanford University

It is widely recognized that the comparative efficacy of a new treatment can depend on certain characteristics of the patients that are difficult to pre-specify at the design stage. On the other hand, narrowly defining the patient characteristics that account for patient heterogeneity for inclusion and exclusion in a clinical trial may unnecessarily limit the usability of the treatment to a small sub-population. We introduce an adaptive design that resolves this dilemma. We also illustrate how this design is used in the DEFUSE III trial which compares standard medical therapy with endoscopic removal of the clot in ischemic stroke patients.

10:35am-11:00am, Dong-Yun Kim, NHLBI/NIH  

Sequential Patient Recruitment Monitoring (SPRM)

Dong-Yun Kim
NHLBI/NIH

Patient recruitment in a clinical trial has a significant impact on the power and quality of statistical analysis, and the continuity of the trial itself. A timely recruitment monitoring gives researchers a window of opportunity to detect a problem before it is too late. In this talk, we introduce a new monitoring method based on fully sequential tests on a stream of packet entry data. Simulation studies suggest that the monitoring works well for a moderate packet size. Also it can easily accommodate enrollment changes during the study period. We illustrate the new method using real enrollment data from clinical trials.

11:00am-11:25am, Anastasios Tsiatis, North Carolina State University

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs,

Part I

Butch Tsiatis
North Carolina State University

In the treatment of chronic diseases or disorders like cancer, HIV infection, substance abuse, and cardiovascular disease, clinicians make a series of treatment decisions at milestones in the disease/disorder process.  The goal of the clinician is to determine the most beneficial treatment option from among those available at each decision point for a patient given his/her baseline and evolving physiological, demographic, genetic/genomic, and clinical characteristics and medical history. A treatment regime is a list of decision rules, each corresponding to a key decision point, that formalizes clinical decision-making.   Each rule takes accrued information on a patient to that point as input and returns the treatment option the patient should receive.   Identifying the optimal treatment regime from data, that leading to the most favorable expected outcomes for patients receiving treatment according to all of its rules, provides an evidence-based approach to clinical decision-making.  

In Part I of this two talk sequence, we consider the case of a single decision point, introduce a formal statistical framework in which an optimal treatment regime can be defined precisely, and discuss statistical methods for estimation of an optimal regime from data from a randomized clinical trial or observational study.

11:25am-11:50am, Marie Davidian, North Carolina State University

Implementing Precision Medicine: Optimal Treatment Regimes and SMARTs, Part II

Marie Davidian
North Carolina State University

In the treatment of chronic diseases or disorders like cancer, HIV infection, substance abuse, and cardiovascular disease, clinicians make a series of treatment decisions at milestones in the disease/disorder process.  The goal of the clinician is to determine the most beneficial treatment option from among those available at each decision point for a patient given his/her baseline and evolving physiological, demographic, genetic/genomic, and clinical characteristics and medical history. A treatment regime is a list of decision rules, each corresponding to a key decision point, that formalizes clinical decision-making.   Each rule takes accrued information on a patient to that point as input and returns the treatment option the patient should receive.   Identifying the optimal treatment regime from data, that leading to the most favorable expected outcomes for patients receiving treatment according to all of its rules, provides an evidence-based approach to clinical decision-making.

In Part II of this two talk sequence, we discuss extension of the framework and methods in Part I to the case of sequential decision making over multiple decision points.   A Sequential, Multiple Assignment, Randomized Trial (SMART) is an ideal study design yielding data for this purpose.  We provide an introduction to SMARTs and review our collaborative experience designing SMARTs for development of optimal interventions for cancer pain management, HIV prevention, and other chronic conditions.

11:50am-12:05pm, Q&A, Floor Discussion

12:05pm-1:10pm
-
Break

1:10pm-3:05pm
-
Session 3: Complex Data & Longitudinal Analysis

Chair: Colin O. Wu, NHLBI/NIH

1:10pm-1:35pm, Paul Albert, NCI/NIH

Shared Random Parameter Models for Risk Prediction:

A Legacy of the NHLBI Biostatistics Program

Paul Albert
NCI/NIH

Shared random parameter models were first introduced by researchers at the NHLBI Biostatistics Branch for analyzing longitudinal data with informative dropout (Wu and Carroll, 1987; Wu and Bailey, 1988; Follmann and Wu, 1995; Albert and Follmann, 2000; Albert et al., 2002). This work was all focused on characterizing the longitudinal data process in the presence of an informative missing data mechanism that is treated as a nuisance. Shared random parameter modeling approaches have also been developed from the perspective of characterizing the relationship between longitudinal data and a subsequent outcome that may be an event-time, a dichotomous measurement, or another longitudinal outcome. From an applications-rich perspective, this talk will focus on using shared random parameter models for risk prediction. We will show applications in obstetrics and teenage driving to show how these models work, and use these applications to show some of the challenges still remaining in this field.

1:35pm-2:00pm, Zhiliang Ying, Columbia University

Latent class and latent factor models with applications to psychiatry

Zhiliang Ying
Columbia University

This talk focuses on two classes of statistical models for item response data which often arise from educational assessment and psychological measurement.  The first one is the class of multidimensional item response theory models and the second one is the Q-matrix based class of diagnostic classification models. Recent developments on these models and their extensions will be discussed and applied to psychiatry. 

2:00pm-2:25pm, Xin Tian, NHLBI/NIH

Statistical Indices of Risk Tracking in Longitudinal Studies

Xin Tian
NHLBI/NIH

It is known that many CVD risk factors have tracking properties in the sense that high risk subjects at a younger age are more likely to have high CVD risks at an older age. This type of dynamic tracking may explain the mechanism of why a risk factor measured at an earlier age may be used to predict certain CVD events at a later age. Existing methods and applications of dynamic risk tracking in longitudinal studies have focused on two statistical concepts (Ware and Wu, 1981; Doulkes and Davis, 1981; McMahan, 1981; Larry et al., 1991; Wilsgaard et al., 2001): (a) the ability of predicting the future value of an outcome variable from the repeated measurements in the past, (b) the ability of maintaining the relative ranking of the outcome variable over time. These existing methods of dynamic tracking, however, depend on the assumptions that the longitudinal outcomes and covariates satisfy a parametric model. In many longitudinal studies, the time trends of the risk factors and CVD events may not satisfy known parametric regression models. It is necessary to relax the parametric model assumptions of the outcomes and covariates, so that the dynamic tracking of the CVD risks can be defined and estimated under the more flexible nonparametric models. We develop a class of nonparametric dynamic tracking indices based on the conditional distributions of the longitudinal variables. This type of nonparametric dynamic tracking indices can be used to quantify the tracking abilities of CVD risks among study participants.

2:25pm-2:50pm, Scott L. Zeger, Johns Hopkins University             

A Statistical Perspective on Population and Individualized Health (Precision Medicine);

Two Sides of the Same Coin

Scott L. Zeger
Johns Hopkins University

More than a 125 years ago, American universities created the current academic medical model on a foundation of the emerging biological sciences. Today, because of the intertwined revolutions in biological and information technologies, biomedicine has become increasingly data intensive. Medical research is being driven by new biomedical measurements and analyses with the goal to provide more accurate and precise answers to clinical questions including: what is an individual’s current health state; what is her health trajectory; and what are the likely benefits and harms associated with each available intervention. While the aspiration is to provide answers tailored to the individual, in practice, progress occurs through iterative sub-setting of populations into ever more homogeneous subgroups. Obtaining scientific answers to these and similar questions for each subgroup leads to better clinical decisions and ultimately better population health outcomes. But, answers depend upon expertise and infrastructure for the acquisition and scientific use of health data.

This talk discussed a statistical perspective for pursuing scientific answers to clinical questions like the ones above. The framework makes clear that population and individualized health are two sides of the same coin. We use hierarchical models that describe the flow of information from the population level to the subgroup to inform health decisions whose outcomes in turn update population-level knowledge. We consider the implications of this perspective on the infrastructure they require to improve health outcomes at more affordable costs. We consider the key steps necessary to effectively mobilize modern biomedical and data science to improve clinical care of Americans analogous to the steps taken a century ago.

2:50pm-3:05pm, Q&A, Floor Discussion

3:05pm-3:20pm
-
Break

3:20pm-5:15pm
-
Session 4: NHLBI Studies & OBR Statistical Research

Chair: Myron A. Waclawiw, NHLBI/NIH

3:20pm-3:45pm, David L. DeMets, University of Wisconsin

Early Contributions for RCTs by NHLBI Biostatisticians

David L. DeMets
University of Wisconsin

Since the early 1950s, the National Heart, Lung, and Blood Institute (NHBLI) has conducted a long series of influential randomized clinical trials in heart, lung, and blood diseases.   Five decades ago, cardiovascular disease was the overwhelming cause of mortality and several potential risk factors were being identified that needed to intervened upon and evaluated in a rigorous study for potential benefit or possible harm.  The NHLBI biostatisticians have been central to the design, conduct, monitoring, and final analyses of these trials. In these early years, there were few relevant textbooks and literature for clinical trials as this was a new frontier.  This impelled the NHLBI statisticians and their colleagues at academic based coordinating centers to develop methodology that would address challenging questions related to design, conduct and analyses of these emerging RCTs. This new methodology included organizational structure, operational procedures and statistical methods.  Ten of many possible contributions will be briefly highlighted.  Perhaps most importantly, the individual members of the group had a collective vision passed from member to member over time that new methodology must fit the questions being asked  and that they must be engaged as statistical scientists throughout the course of trial.

3:45pm-4:10pm, Janet Wittes, Statistics Collaborative, Inc.        

Dual Spaces and Back Translation – From Question to Statistics to Question

Janet Wittes
Statistics Collaborative, Washington DC

In his presidential address at the American Statistical Association, Jerry Cornfield (Branch Chief of the Biometrics Research Branch from 1963-1967) talked about how when someone asked him a scientific question he routinely replaced nouns with letters (A, B, …). But of course he reported these letters back as nouns. The letters allowed abstraction; the nouns constituted the concrete that held the clinical researcher and the statistician together. In this talk, I reflect on some of the questions we at the Biometrics Research Branch dealt with in the 1980’s, how we abstracted those question into more general situations, and then how that abstraction allowed us to address a series of seemingly unrelated problems.

4:10pm-4:35pm, Nancy L. Geller, NHLBI/NIH

Cardiovascular clinical trials in the 21st century:
Pros and Cons of an inexpensive paradigm

Nancy L. Geller
NHLBI/NIH

The expense of large cardiovascular clinical trials with cardiovascular event endpoints has led to attempts to simplify trials to make them less complex and easier to implement. The new paradigm is not-too-large, simple, and inexpensive. Innovations include use of registries to find eligible participants and lower per subject costs, simplified data collection, and use of surrogate endpoints rather than cardiovascular events in order to decrease sample size and complete trials more quickly. Several examples will be given to illustrate the pros and cons of this new paradigm.

4:35pm-5:00pm, James Troendle, NHLBI/NIH

How to Control for Unmeasured Confounding in an Observational
Time-To-Event Study With Exposure Incidence Information:
The Treatment Choice Cox Model

James Troendle
NHLBI/NIH

In an observational study of the effect of a treatment on a time-to-event outcome, a major problem is accounting for confounding due to unknown or unmeasured factors. We propose including covariates in a Cox model that can partially account for an unknown time independent frailty that is related to starting or stopping treatment as well as the outcome of interest. These covariates capture the times at which treatment is started or stopped and so are called treatment choice (TC) covariates. Three such models are developed. First, a nonparametric TC model which assumes a very general form for the respective hazard functions of starting treatment, stopping treatment, and the outcome of interest. Second, a parametric TC model that assumes that the log hazard functions for starting treatment, stopping treatment, and the outcome event include frailty as an additive term. Finally, a hybrid TC model that combines attributes from the parametric and nonparametric TC models. As compared to an ordinary Cox model, the TC models are shown to substantially reduce the bias of the estimated hazard ratio for treatment when data are simulated from a realistic Cox model with residual confounding due to unobserved frailty. The simulations also indicate that the bias decreases as the sample size increases. A TC model is illustrated by analyzing the Women's Health Initiative observational study of hormone replacement for post-menopausal women.

5:00pm-5:15pm, Q&A, Floor Discussion

8:00am-9:55am
-
Session 5: Big Data, Data Mining & Machine Learning

Chair: James Troendle, NHLBI/NIH

8:00am-8:25am, Hemant Ishwaran, University of Miami

Random Survival Forests

Hemant Ishwaran
Division of Biostatistics, University of Miami

Ensemble learning involves taking elementary procedures (called base learners) and strategically combining them to form an ensemble (for example, a standard practice in classification is to take a collection of simple classifiers and average their votes). One of the most successful ensembles is random forests (RF), a tree-based learning method introduced by Leo Breiman, that has been shown to have state of the art prediction performance.  Originally RF focused on regression and classification problems, but was later extended to the survival setting by the author in a method called "Random Survival Forests" (RSF).  In this talk, I review some of the nuances that had to be overcome in developing RSF and I identify important future challenges for forests, including to big data.

8:25am-8:50am, Wei-Yin Loh, University of Wisconsin

Subgroup identification and inference with regression trees

Wei-Yin Loh
University of Wisconsin

There is increasing interest in employing machine learning algorithms to find patient subgroups with differential treatment effects in randomized experiments. As a result, a large number of algorithms have appeared. Regression tree methods are particularly well-suited because the terminal nodes of the trees directly specify the subgroups.  Many tree methods, however, have undesirable statistical and computational properties that make them unattractive for serious application: (i) selection bias in the predictor variables that define the subgroups, (ii) overly optimistic treatment effects in the subgroups, and (iii) potential for long computation time. The common cause is greedy search. This talk will present a regression tree algorithm called GUIDE that does not have these properties. In addition, the talk will present a solution to the problem of post-selection inference -- how to perform statistical inference on the estimated treatment effects in the subgroups.

8:50am-9:15am, Daniela Witten, University of Washington

Flexible and Interpretable Regression Using Convex Penalties

Daniela Witten
University of Washington

We consider the problem of fitting a regression model that is both flexible and interpretable. We propose two procedures for this task: the Fused Lasso Additive Model (FLAM), which is an additive model of piecewise constant fits; and Convex Regression with Interpretable Sharp Partitions (CRISP), which extends FLAM to allow for non-additivity. Both FLAM and CRISP are the solutions to convex optimization problems that can be efficiently solved. We show that FLAM and CRISP outperform competitors, such as sparse additive models (Ravikumar et al, 2009), CART (Breiman et al, 1984), and thin plate splines (Duchon, 1977), in a range of settings. We propose unbiased estimators for the degrees of freedom of FLAM and CRISP, which allow us to characterize their complexity.

9:15am-9:40am, Colin O. Wu, NHLBI/NIH          

Prediction of Cardiovascular Events in Long-Term Observational
Studies by Machine Learning

Colin O. Wu
NHLBI/NIH

Machine learning methods are potentially powerful tools for characterizing cardiovascular risks, predicting health outcomes and identifying biomarkers in deeply phenotyped population-based observational studies. The Multi-Ethnic Study of Atherosclerosis (MESA) is one of the several NHLBI long-term population-based observational cohort studies which could be benefited from the machine learning techniques. As part of the MESA data, we have longitudinal observations of 735 variables from traditional cardiovascular risk assessment, electrocardiography (ECG), cardiac magnetic resonance imaging (MRI), chest computed tomography, carotid ultrasonography, questionnaires and biomarker panels obtained from 6841 participants aged 45 to 84 years who were initially free of cardiovascular diseases. In this project, we compared the results of variable selection via the traditional Cox Proportional Hazard (CPH) regression and the random survival forest (RSF) for the prediction of 6 cardiovascular outcomes (incident heart failure, stroke, atrial fibrillation, coronary heart disease, cardiovascular disease outcomes, and all-cause mortality) over 12 years of follow-up. Our findings identify a list of risk factors which are potentially influential for the prediction of these 6 cardiovascular outcomes. We will discuss the clinical interpretations of our findings, limitations of our chosen machine learning methods, and some potential directions for future statistical methodological research.

9:40am-9:55am, Q&A, Floor discussion

9:55am-10:10am
-
Break

10:10am-11:35am
-
Session 6: Genetics & Next Generation Sequencing

Chair: Xin Tian, NHLBI/NIH

10:10am-10:35am, Xihong Lin, Harvard University

Statistical Inference in Massive Whole-Genome Genetic and Genomic Studies

Xihong Lin
Harvard University
TBA

Massive genetic and genomic data present many exciting opportunities as well as challenges in data analysis and result interpretation, e.g., how to develop effective strategies for signal detection using massive genetic and genomic data when signals are weak and sparse. Many variable selection methods have been developed for analysis of high-dimensional data in the statistical literature. However limited work has been done on statistical inference for massive data. In this talk, I will discuss hypothesis testing for analysis of high-dimensional data motivated by gene, pathway/network based analysis in whole-genome array and sequencing studies. I will focus on signal detection when signals are weak and sparse, in genetic association studies and mediation analysis. I will discuss hypothesis testing for signal detection using the Generalized Higher Criticism (GHC) and Berk-Jones tests, as well as challenges in inference in whole genome medication analysis. The results are illustrated using several datasets from genome-wide epidemiological studies.

10:35am-11:00am, Yi-Ping Fu, NHLBI/NIH

Low-frequency and rare coding variants associate with myocardial infarction and coronary heart disease

Yi-Ping Fu
NHLBI/NIH

Low-frequency and rare DNA sequence variants are inadequately assessed by earlier generations of genome-wide associations studies (GWAS).  While sequencing across the exome or genome can directly examine these variants, such an approach is still costly to apply in large-scale population studies.  The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium used the “ExomeChip” to genotype about 200,000 low-frequency and rare coding variants across the genome in 55,736 individuals of European ancestry from nine prospective cohorts, and test these variants for association with incident myocardial infarction and coronary heart diseases.  Single-variant and gene-based analyses were performed adjusting for age, sex and population substructure.  We confirm previous association with non-coding common variants at chromosomal 9p21 region and the PHACTR1 gene, and also reported a low-frequency variant in ANGPTL4 is associated with reduced risk of incident coronary heart disease in this prospective meta-analysis.

11:00am-11:25am, Nilanjan Chatterjee, Johns Hopkins University

TBA

11:25am-11:40am, Q&A, Floor Discussion

11:40am-12:30pm
-
Panel Discussion, Summary & Recommendations to NHLBI

Chair: Nancy L. Geller, NHLBI/NIH

Panelists:

  • Paul Albert, NCI/NIH
  • David L. DeMets, University of Wisconsin
  • Hemant Ishwaran, University of Miami
  • Xihong Lin, Harvard University
  • Janet Wittes, Statistics Collaborative, Inc.

12:30pm
-
Adjourn