My research of statistical methodology is motivated by making sense of the data and answering scientific questions. Many problems I am working on are related to Wearable Devices, which includes accelerometers and heart monitors. Currently these sensors are extremely popular in both research studies and consumer market, since they are small, long lasting and easy to use. They can be found in most activity trackers, smart bands or modern cell phones. While the big hope here is to replace the notoriusly unreliable self-reports, researchers have met great challenges while analyzing the enormous data generated by these devicess.

Sample activity data.

My interest of research spans from direct analysis of raw accelerometry data, to the processing of raw data, to associating the processed data with human health. The following are a few examples.


Predicting activity type

Predicting type of activity that the subject performed using accelerometry data poses as an important component of many epidemiological studies, as it helps researchers understand what the subjects chose to do and how long they spent time in each activity.

Raw data from tri-axial accelerometer, during two periods corresponding to different activities.

Supervised learning using movelets

Observing the raw accelerometry signals with annotations (of what the subjects were doing at each moment), we identified substantial difference between the patterns of signals during activities such as walking, lying down or standing up. Following this idea, we came up with a prediction method based on a concept called "movelet" and successfully predicted various types of activities (paper). Although in the paper the movelet method was applied only to the data collected by single accelerometer, it can be generalized to accommodate data from multiple accelerometers. In another study, my colleagues and I managed to predict activity type based on two wrist-worn and one hip-worn accelerometers (paper).

Supervised learning to unsupervised learning

The original movelet method requires annotated signals as training data, which are usually acquired during in-lab sessions. This is near to impossible in many large scale studies. Therefore we are extending movelet method so that it can still predict specific types of activity even without training data.


Raw data to summarized data

Quantifying the duration and magnitude of activities is another main aspect of my research, since in many cases, we do not care about what people have done during the day, but how much they have done. Thus, I worked on how to extract such information from the raw data.

Activity Intensity of a subject in about 4 days.

Activity Intensity: a summarize of raw accelerometry signal

We introduced a set of metrics which summarize the raw accelerometry data into less dense but more interpretable variables (paper). Among them, the "Activity Intensity", is shown the figure above. It measures the variability of the raw acceleration signals after removing the systematic variability of the device. Activity Intensity is our answer to the commonly used "Activity Count", whose definition is often different across accelerometer manufacturers and not publicly available.

Activity Index: a new and improved Activity Intensity

To address some limitations of the original Activity Intensity, we proposed "Activity Index" as a replacement (paper). This new metric has 3 important properties: 1) ease to implement in large studies; 2) additivity and 3) immune to the rotation of the accelerometer. Compared to the commonly used metric "Activity Count" from the ActiGraph, the Activity Index had a much improved prediction performance both for distinguishing various types of activities and for predicting energy expenditure. The R package for Activity Index calculation is available on GitHub.

Activity Index and Activity Count V.S. Energy Expenditure

Association studies

The purpose of collecting all these accelerometry data is to associate them with human health. I am working on multiple projects with this goal. In one of them, my colleagues and I proposed a two-stage model that captures both the transition dynamics between active/inactive periods, and activity intensity dynamics during active periods. We applied this method to the data collected from the Baltimore Longitudinal Study of Aging. I also work with scientists of the Early Infant Care and Risk of Obesity study to learn how physical activity affects the growth of newly born babies.