Personal

2013/14 Fall Journal Club Schedule

Date/Room	Person	Title	Abstract
09/19/2013 W2017	Elizabeth Sweeney	The Structure of Structural Magnetic Resonance Imaging Data and Applications in Multiple Sclerosis	Structural magnetic resonance imaging (MRI) of the brain and other parts of the body provide detailed and accurate anatomical information. Much work has been done on the analysis of statistics and summary metrics derived from these images, but until recently statisticians and biostatisticians have played a limited role in working directly with this data. To familiarize you with this exciting data, I will provide an introduction to structural MRI, including visualization and resources for learning to work with this data in R. I will also discuss two algorithms we developed for detecting and segmenting lesions in the brains of patients with multiple sclerosis in structural MRI: Subtraction-Based Logistic Inference for Modeling and Estimation (SuBLIME) and OASIS is Automated Statistical Inference for Segmentation (OASIS).
10/03/2013 W2017	Leonardo COLLADO TORRES	Implementation of fast differential expression analysis annotation-agnostic across groups with biological replicates	Since the development of high-throughput technologies for probing the genome we have been interested in finding differences across groups that could potentially explain the phenotypic differences we observe. In other words, methods for generation of hypothesis at a large scale where we try our best to remove artifacts. The traditional tools have focused on the transcriptome and are highly dependent on existing annotation. Alyssa Frazee et al developed a statistical framework to find candidate Differentially Expressed Regions (DERs) that I have attempted to make faster. I will introduce the problem we are trying to solve and show examples of https://github.com/lcolladotor/derfinder applied to various data sets.
10/17/2013 W2017	Aaron Fisher	Intro to Adaptive Trials, and the EAGLE Visualization Tool	This talk will be an introduction to how adaptive trials work, and the basic theoretical framework behind them. We'll be glossing over the more complicated details, so if you know what a multivariate normal distribution is, and you know what a null distribution is, then it should be a breeze. The specific application we'll be talking about is a trial design where we start by enrolling everyone, and then decide who to continue enrolling based on results from the currently enrolled patients. For example, if one subpopulation doesn't appear to be benefiting, we might stop enrolling them towards the end of the trial. This summer I worked on a Shiny App with Harris Jaffee and Michael Rosenblum, which helps users explore trial designs with this kind of adaptive enrollment. If there's time, I'll show some of the app's bells & whistles.
10/31/2013 W2017	Chen Yue	Principal curves and surfaces	The concept of principal curves and surfaces was proposed in Trevor Hastie's PhD thesis in 1984, and later published in JASA (1989). The idea of this concept is to find a lower dimensional manifold embedded in a higher dimension space. In this presentation, I will share my limited knowledge on the intuition, algorithm and applications on this interesting concept. I will also show some results that how the original algorithm was improved. Besides, I will introduce several other related concepts such as ISOMAP (published in Science), a current algorithm that helps finding low dimensional manifold.
11/12/2013 W2017	Amanda Mejia	A Layered Grammar of Graphics	A grammar of graphics is a tool that enables us to concisely describe the components of a graphic. Such agrammar allows us to move beyond named graphics (e.g., the “scatterplot”) and gain insight into the deep structure that underlies statistical graphics. This article builds on Wilkinson, Anand, and Grossman (2005), describing extensions and refinements developed while building an open source implementation of thegrammar of graphics for R, ggplot2. The topics in this article include an introduction to the grammar by working through the process of creating a plot, and discussing the components that we need. The grammar is then presented formally and compared to Wilkinson’s grammar, highlighting the hierarchy of defaults, and the implications of embedding a graphical grammar into a programming language. The power of the grammar is illustrated with a selection of examples that explore different components and their interactions, in more detail. The article concludes by discussing some perceptual issues, and thinking about how we can build on thegrammar to learn how to create graphical “poems.”
12/05/2013 W2017	Ivan Diaz	Reaping the Computational Benefits of Targeted Maximum Likelihood Estimation using Exponential Families	Targeted maximum likelihood estimation (TMLE) is a general template for constructing estimators of parameters in semi and nonparametric models. A crucial step in implementation of TML estimators in a nonparametric model is the proposal of a parametric submodel for the relevant components of the likelihood. In the context of causal inference, where TMLE has been most studied, TML estimators can often be implemented by running standard regression software on auxiliary variables carefully constructed, usually referred to as ``clever covariates''. In this paper we examine targeted maximum likelihood estimation in a more general setting, exploring the use of an exponential family to define the parametric submodel. We illustrate the method in four examples involving estimation of the mean of an outcome missing at random, median regression, variable importance, and the causal effect of a continuous exposure. We take advantage of the fact that estimation of a parameter in an exponential family is a convex optimization problem, a well developed area for which software implementing reliable and computationally efficient methods exist. This emplementation of TMLE provides a completely general framework in which TML estimators can be computed for any parameter that can be defined in the nonparametric model.
02/27/2014 W2008	Huitong Qiu	Robust covariance estimation with application in portfolio optimization	In portfolio optimization, estimating the covariance matrix (or the scatter matrix, which is a matrix proportional to the covariance matrix) of stock returns is the key step. In this paper, we propose a new robust portfolio optimization strategy by resorting to a quantile based scatter matrix estimator. Computationally, the proposed robust portfolio optimization method is as e cient as its Gaussian-based alternative. Theoretically, by exploiting the quantile-based statistics, we show that the actual portfolio risk approximates the oracle risk with parametric rate even under very heavy-tailed distributions and a stationary time series with weak dependence. The rate of convergence is set in a double asymptotic framework where the portfolio size may scale exponentially with sample size. The empirical e ffectiveness of the proposed estimator is demonstrated in both synthetic and real data. The experiments visualize that the proposed method can signi cantly stabilize portfolio risk under highly volatile stock returns, and e ectively avoid extremal losses.
03/27/2014 W2029	John Muschelli	Statistical Modeling: The Two Cultures	The link to this paper is: http://projecteuclid.org/euclid.ss/1009213726
04/10/2014 E9519	Parichoy Pal Choudhury	Mendelian Randomization: A Review from a Causal Inference Perspective	In epidemiology it is often of interest to study the causal eect of a modi- able phenotype on the risk of a disease. Though randomized controlled trials are considered \gold standard" for such questions, they are often not ethical or practical to conduct. Moreover, it is dicult to draw causal inference from ob- servational data due to the problems of confounding and reverse causation. One statistical approach to deal with unmeasured confounding is through the use of instrumental variables. When genes are considered as instrumental variables, the method is called \Mendelian randomization". In this talk, I review some of the statistical methods that exploit the idea of Mendelian randomization. In particular, when a gene satises the core conditions that dene an instrumen- tal variable, the average causal eect of the phenotype on the outcome is not identied, but it is possible to derive bounds for this parameter. I discuss addi- tional parametric restrictions and other assumptions needed to identify causal parameters of interest. I also highlight some of the challenging open research questions in this area.
04/24/2014 E2133	Lei Huang	Bayesian scalar-on-image regression with application to association between intracranial DTI and cognitive outcomes	Diffusion tensor imaging (DTI)measureswater diffusionwithinwhite matter, allowing for in vivo quantification of brain pathways. These pathways often subserve specific functions, and impairment of those functions is often associated with imaging abnormalities. As a method for predicting clinical disability from DTI images, we propose a hierarchical Bayesian “scalar-on-image” regression procedure. Our procedure introduces a latent binary map that estimates the locations of predictive voxels and penalizes themagnitude of effect sizes in these voxels, thereby resolving the ill-posed nature of the problem. By inducing a spatial prior structure, the procedure yields a sparse associationmap that also maintains spatial continuity of predictive regions. The method is demonstrated on a simulation study and on a study of association between fractional anisotropy and cognitive disability in a cross-sectional sample of 135 multiple sclerosis patients.