The Undergraduate Student Research Awards (USRA) programs expose upper-division undergraduates to research, with the goal of encouraging upper-division students to pursue graduate studies leading to research careers. Students must be available to work full-time during the 16-week tenure of the award, from May-August, 2020. Previous students have been able to link their USRA to a co-op work term; please contact the Science Co-op office for more information on this possibility. If you are assigned to a project and would like to link the USRA to a co-op work term, please check first with your supervisor. The supervisor's co-op participation is limited to a short midterm check-in with the co-op program of about 15 minutes and a short online evaluation form at the end of the work term. Students requiring further feedback on or assistance with the co-op report or paperwork should consult the co-op office.
Science students have access to two programs, the Natural Sciences and Engineering Research Council (NSERC) USRA program and the SFU Vice President, Research (VPR) USRA program. NSERC USRAs are restricted to Canadian citizens and permanent residents; VPR USRAs are open to international students. Canadian citizens and permanent residents should apply to the NSERC program only. For more information, please see the SFU Dean of Graduate Studies USRA website.
The Department of Statistics and Actuarial Science is hoping to appoint multiple students for Summer 2020. Proposed projects are as follows, and information on how to apply is given below. Applications are due by midnight of January 20th.
Identification and Registration of Hits in Genome-Wide Association Studies
Supervisor: Lloyd Elliott
Multivariate genome-wide association studies (GWAS) provide a vector function of the genome (the elements of the vector are the p-values of each phenotype). Typically, these objects are examined by hand in order to determine "hits" (local minimum). However, when the number of phenotypes is large, an automation of this process is required. The project involves 1) implementation of a method for determining "hits", 2) extension of the method to allow registration of hits between phenotypes (i.e., determine when two hits on two phenotypes are close enough to be called the same location.) 3) development of Bayesian nonparametric theory for this problem.
Visualisation of Hierarchical Clustering
Supervisor: Lloyd Elliott
Methods such as decision trees, linkage based clustering (dendrograms), binary space partitioning, and Ward's algorithm all provide hierarchical clusterings in which the data are organised at the leaves of a tree. Visualisation of the validity of this tree is difficult when the tree is large. This project involves development and implementation of a new method for visualising hierarchical clustering in which each tree split is associated with a one dimensional projection of the partitioned data. These projections are rotated and translated within a single plot, forming a fractal-like organisation for which the nature of the clustering can be readily appreciated.
A New Approach to Estimating the Distribution of HIV Infection Time with Interval Censored Data
Supervisor: X. Joan Hu
The available information on time to HIV infection in an AIDS study is often framed into interval censored data. Well-established approaches such as the Turnbull estimator are believed rather inefficient especially when the censoring intervals are relatively wide. Motivated partly by a wildfire control study, Xiong (PhD thesis, 2020) presents a methodology for making inference on the distribution of an event duration from data with missing time origin. She links available longitudinal measures to the event duration and estimate the duration distribution via the first-hitting-time model (e.g. Lee and Whitmore, 2006). This project aims to adapt Xiong's approaches in AIDS studies where the time to HIV infection is missing and there are available biomarkers over time. It includes the following two stages.
Stage 1. We will explore two to three recent AIDS studies to identify biomarker measures relevant to HIV infection.
Stage 2. We will then model HIV infection time jointly with the identified biomarker measures, and conduct inference on the distribution of the infection time.
Collective Defined Contribution Pension Plan with Explicit Stabilization Mechanisms
Supervisors: Barbara Sanders and Jean-François Bégin
In Collective Defined Contribution (CDC) pension plans, savers pool their money into a fund and share investment and longevity risks. Many of these plans include explicit stabilization mechanisms such as countercyclical buffers and contingency reserves. Most of the academic literature, however, assumes mandatory participation when studying these plans even though it is often not the case in real life. Given that Allen and Gale (1997) showed that, without mandatory participation, the plan would break down, some conclusions from prior studies might not be valid.
This USRA project (May 2020 to August 2020) aims to understand CDC pension plans with explicit stabilization mechanisms and the impact of mandatory participation. Specifically, this research project is divided into two parts: (1) derive and analyze optimal pension arrangements with explicit stabilization mechanisms—so-called corridor plans—and (2) investigate the issue of mandatory participation. The student will be responsible for:
- Familiarizing themselves with the current literature on CDC plans.
- Analyzing explicit stabilization features for CDC plans in the spirit of Cui et al. (2011).
- Relaxing the mandatory participation assumption and finding ways to mitigate potential breakdowns.
- Writing code to implement the model.
- Documenting all work.
Parts of this research project builds on Lu Yi’s Master’s thesis (i.e., Yi, Sanders and Bégin, 2019).
Merging Data for Genetic Studies
Supervisor: Brad McNeney
Alzheimer's disease is the most common form of dementia, accounting for 60-80 percent of cases. The disease is progressive and there is no known cure. Genetics influences our risk of developing Alzheimer's disease and could play a role in early detection. This project involves merging genetic data from two studies of Alzheimer's Disease. To merge the studies, missing data on genetic variation of the subjects will need to be filled in through the use of a comprehensive and publicly available reference panel of genetic variation. The student will implement the merging process and write a reproducible document describing the workflow. We are looking for a student with good writing skills who is curious about the analysis of data on genetic variation. Prior experience with computing in a Unix environment (e.g. shell scripting) and with documenting workflows in R and RMarkdown would be an asset.
Making Sense of Genetic Variation Through Ancestries
Supervisor: Jinko Graham
Researchers study patterns of genetic variation to learn about evolution, migration and population growth, and to identify specific genetic variants that influence inherited traits. A recent breakthrough in the study of genetic variation is the succinct tree sequence (STS), a novel data structure that represents the ancestry of whole genomes and allows the efficient storage of population-scale data. Recently developed software for simulation and inference of STSs from population samples is made freely available as Python modules. This research project will involve getting to know the software, exploring its features and adding 3-5 new tutorials to a collection of existing RMarkdown tutorials for statisticians. We are looking for someone who is curious about data on genetic variation and its uses. Experience with RMarkdown and Python would be an asset.
Rough Volatility Modelling
Supervisor: Jean-François Bégin
The number of financial derivatives available in the markets has increased dramatically in recent years. As argued in Renault (1997), it is a good idea to combine primitive and derivative asset prices because the information about the stochastic properties of an asset is contained both in the history of the underlying series and the price of any option written on it.
This USRA project (May 2020 to August 2020) investigates the use of financial derivatives as an additional source of information in the context of rough volatility models (e.g., Gatheral, Jaisson and Rosenbaum, 2014). Specifically, the student will put forward an estimation method for rough volatility models that combines both return and option data. The student will be responsible for:
- Familiarizing themselves with the current literature on rough volatility modelling.
- Proposing an estimation methodology for rough volatility models that combines both returns and options.
- Extracting data from commonly used datasets in finance.
- Writing code to implement the model and its estimation.
- Documenting all work.
Genetics Association with Brain Networks and Alzheimer's Disease
Supervisor: Jiguo Cao
This project is motivated by an imaging genetics study of the Alzheimer's Disease Neuroimaging Initiative (ADNI), where the objective is to examine the association between images of volumetric and cortical thickness values summarizing the structure of the brain as measured by magnetic resonance imaging (MRI) and a set of 486 SNPs from 33 Alzheimer's Disease (AD) candidate genes obtained from 632 subjects. The goal is to explore the data and identify genes associated with brain networks and Alzheimer's Disease.
Compression for Population Genetic Data
Supervisor: Lloyd Elliott
Previous work in dictionary methods for compression of population genetics data have provided little improvement, even when shared dictionaries and modern compression methods such as Facebook's cutting-edge ZStandard (zstd) and the classical zlib are employed. In this work, the student will explore compression methods based on source coding theory or other compression methods, with application to efficiency in storage and access time of population-level genetic data, and multivariate genome wide-association study. This work leverages the Bayesian nonparametric aspects of exchangeability and partial-exchangeability of population-level genetic data. Applications include summary statistics for genome-wide association studies, logistic regression and advances in compression theory. Familiarity with Intel AVX instructions, and parallel architectures such as POSIX threads and OpenMP or forks are recommended, as are fundamental knowledge of the C programming language or fortran.
If you are interested in applying, please follow the procedure below:
Note: We thank all applicants for their interest but request that they refrain from contacting prospective supervisors. Supervisors will contact applicants selected for an interview.
Canadian citizens and permanent residents should apply for the NSERC USRA only; please do not apply for the VPR USRA. If you are eligible for the NSERC USRA:
- Go online at the NSERC site: www.nserc-crsng.gc.ca/Students-Etudiants/UG-PC/USRA- BRPC_eng.asp.
- Please submit the following: the printed NSERC Form 202 (Part I), with your NSERC Online Reference #; an up-to-date unofficial transcript (not an advising transcript); and, optionally, a short statement (<200 words) about yourself and your interest in the USRA programme. These must be submitted to the Grad Secretary in the Department of Statstics & Actuarial Science Room SC K10547 by January 20, 2020.
- Selected students will be notified by the department. These students must verify applications online by January 31, 2020. Note that this includes uploading transcripts and having a supervisor start Form 202 Part II.
If you are only eligible for the VPR USRA:
- Go to www.sfu.ca/dean-gradstudies/awards/undergraduate-awards/sciences-awards.html and complete the student portion of the application form. In the section on Award Information, filling in the proposed supervisor is optional.
- Once you have completed the student portion, print it and attach an up to date unofficial transcript (not an advising transcript) and, optionally, a short statement (<200 words) about yourself and your interest in the USRA programme. These must be submitted to the Grad Secretary in the Department of Statistics & Actuarial Science Room SC K10547 by January 20, 2020
- Selected students will be notified by the project Supervisors.
- Supervisors will fill out the 'Supervisor Information' and 'Research Project' sections of the form by January 31, 2020.