Biostatistics - Johns Hopkins Bloomberg School of Public Health

Department of Biostatistics

People

Prospective Students

Academics

Research

News & Events

Calendar

Consulting

Employment Opportunities

Resource Quicklinks

Computing Environment

Contact

ABSTRACT

Generalized Association Plots (GAP):Dimension Free Information Visualization Environment for Multivariate Data Structure

Dr. Chun-houh Chen, Assistant Research Fellow, Institute of Statistical Science, Academia Sinica, Taiwan

Conventional data visualization tools for high dimensional data set usually adopt dimension reduction technique such as principal component analysis to project data structure from a higher dimensional space down to a lower dimensional configuration. This dimension reduction procedure is also an information reduction process. Dimension reduction is necessary since conventional tools always use a scatter-plot type of display to represent the metric relationship between observations geometrically. GAP is a dimension free visualization environment for multivariate data structure. Given a multivariate data set, GAP first compute the proximity matrices for variables as well as for subjects. Proper seriations (permutations) are searched for rearrange these two matrices to satisfy certain properties. Double sorted raw data matrix together with two sorted proximity matrices are then projected through appropriate color spectrums to create matrix maps. These three maps should be cross-examined to identify three major pieces of information contained in any multivariate data set: 1. the linkage amongst n subject points in the p-dimensional space (subject-clusters); 2. the linkage between p variable vectors in the n-dimensional space (variable-groups); and 3. the interaction linkage between the sets of subjects and variables.

Several modules have been added to GAP: A dynamic clustering procedure using GAP (DynaGAP) is developed for systematically searching for clustering pattern for both subjects and variables. When data profile is observed more than once, a longitudinal version of GAP (LongGAP) with parallel linkage and overlapping linkage is designed to study the 3 linkages over time. CateGAP (Categorical GAP) is also created for visualizing the information structure for data set of categorical nature. CanoGAP (Canonical GAP) is good for comparing the similarity and difference structure for two sets of variables measured on the same set of subjects.

GAP was originally developed for analyzing data sets from the Taiwan multidimensional psychopathological group research program (MPGRP). It has become a quite powerful environment for information visualization for assisting general purpose multivariate analyses.

Return to Longitudinal/Survival Working Group List | Return to Home Page