USRA Awards

The Undergraduate Student Research Awards (USRA) programs expose undergraduate students to research, with the goal of encouraging students to pursue graduate studies leading to research careers. Previous students have been able to link their USRA to a co-op work term; please contact the Science Co-op office for more information on this possibility.  If you are assigned to a project and would like to link the USRA to a co-op work term, please check first with your supervisor.  The supervisor's co-op participation is limited to a short midterm check-in with the co-op program of about 15 minutes and a short online evaluation form at the end of the work term.  Students requiring further feedback on or assistance with the co-op report or paperwork should consult the co-op office.

Science students have access to two programs, the Natural Sciences and Engineering Research Council (NSERC) USRA program and the SFU Vice President, Research (VPR) USRA program. NSERC USRAs are restricted to Canadian citizens and permanent residents; VPR USRAs are open to international students. Canadian citizens and permanent residents should apply to the NSERC program only.  For more information, please see the SFU Dean of Graduate Studies USRA website.  

The Department of Statistics and Actuarial Science is hoping to appoint up to eight students during the Summer of 2019. A list of proposed projects and information on how to apply is given below. We are asking for applications by January 21 so that supervisors have enough time to interview applicants and complete the NSERC/university nomination forms by the January 31 deadline.

Fast Compression and Data Structures for Genetics

Supervisor: Lloyd Elliott

In the past decade, new consortia for genomics involving half a million or more subjects have become available to researchers.  The scale of these consortia greatly surpasses all previous studies. Advanced software such as bgenie and plink2 are designed to discover genome-wide associations in this context at speed. However, some inefficiencies in the file formats used by advanced genetics software are amplified by the scale of these consortia, and improvements to these file formats could greatly reduce the cost of these studies.


In this project, the student will research methods to improve random access and compression of genetic file formats. This will involve developing pre-seeded dictionaries for compressions such as zlib and Zstandard. The student will also design new database formats for compressed matrices that allow random access on both the rows and the columns of the matrix. Experience with C/C++ or compression specifications or genetic file formats such as bed or bgen.

Deep Learning for Gene Substitution and Multi-Gene Groups

Supervisor: Lloyd Elliott

Much recent work has been done on using neural networks and deep learning to study the relationship between genetics and disease phenotypes. In applications to genome-wide association studies, such work has been met with limited success. This is in part due to the context-free nature of genome-wide association studies, wherein genetic variants are considered with reference to neither flanking genetic material nor epigenetic markers. In this project, the student will examine the feasibility of using multi-gene groups and gene substitution to create invariants for the pooling layers of a neural network trained to perform genome-wide association on the exome. Experience with tensorflow, R or genetics is preferred.

Modelling Economies of Scale in Pension Funds

Supervisors: Jean-François Bégin & Barbara Sanders

Plan mergers have significantly reshaped the pension landscape in several European countries over the past decade and are now emerging as a possible trend in Canada as well. One obvious result of a pension plan merger is a larger fund size, which may lead to economies of scale in pension administration and investment. Empirical studies show that small-sized funds tend to have larger costs and vice-versa (Bikker and de Dreu, 2009), and that larger funds have more negotiation power in investment and can spread their fixed costs across a larger number of members (Bikker et al., 2012). On the other hand, larger plans may be prone to higher costs. For example, the fund may be too large in relation to the number of high-quality investment opportunities available, which negatively impacts returns. Nonetheless, most studies conclude that the advantages of larger size outweigh the disadvantages (Bikker and de Dreu, 2009; Dyck and Pomorski, 2011).
Following these stylized facts, this USRA project (May 2019 to August 2019) aims to put forward a model for administrative and investments costs in pension plans. Specifically, the student will be responsible for:

  1. Familiarizing themselves with the current literature on economies of scale.
  2. Developing a mathematical model.
  3. Writing code to implement the model.
  4. Documenting all work.

This summer project is part of a larger research program on pension mergers. The model developed by the student will be an integral piece of a framework used to quantify the welfare impact of bringing together different pension plans.

Intergenerational Risk Sharing in Funded Pension Plans: A Game-Theoretic Approach

Supervisors: Jean-François Bégin & Barbara Sanders

Most funded pension plans rely on intergenerational risk sharing to create stable retirement income: when deficits arise, generations can provide subsidies instead of reducing pension benefits. Studies have shown that this type of cooperation is beneficial to all participants because reducing uncertainty in retirement increases the expected utility of members’ consumption over their lifetime. Yet, this cooperation has limits: for instance, the younger generation might be unwilling to subsidize the older generation when the deficit is too large. When this limit is exceeded, tensions between generations might lead to the demise of the pension plan.
To address the threat of non-cooperation, game theory can be used to design self-enforceable pension contracts, i.e., ones that reduce tension by taking into account each generation’s self-interest. Recently, Wang (2018) explored the threshold above which cooperation in funded pension plans should not be enforced. In her model, the younger generation’s cost of cooperation is capped from above but not from below: the young can grasp all the upside potential from the old but only bear the downside risk up to a  certain limit. In addition, cooperation is an all-or-nothing deal in Wang’s framework: either there is full cooperation or there is none.
This USRA project (May 2019 to August 2019) aims to put forward two generalizations of Wang (2018): applying a lower bound to the cost of cooperation for a more equitable treatment of surpluses, and exploring partial cooperation. Specifically, the student will be responsible for:

  1. Familiarizing themselves with the current literature on game theory, especially with respect to applications to pension funds.
  2. Extending Wang’s (2018) game-theoretic framework as described above.
  3. Writing code to implement the model.
  4. Documenting all work.

Seeding the simRVsequences R package

Supervisor: Jinko Graham

Identifying DNA variants that cause increased susceptibility to disease is of great scientific and clinical interest.  One strategy for identifying causal rare DNA variants is to study families with more than their fair share of the disease. However, accumulating a large enough sample of well-characterized families for a family-based study is time-consuming and expensive.

To assist with study planning and inference, in collaboration with the BC Cancer Agency's Lymphoid Cancer Families Study, we are developing simRVsequences, an R package to simulate DNA sequence data in families enriched for disease-affected relatives.  Families can be simulated dynamically, allowing for birth, disease-onset, and death at the individual level. At the family level, users can model complex sampling criteria for a family to enter the study. The package is unique in allowing users to simulate the underlying genetic cause and exome sequences in disease-enriched families.

This project will involve gathering, processing and assembling seed data for simRVsequences using publicly-available resources for human genetic variation such as the 1000 Genomes Project. The student will be part of a team working on package development.  Experience with R and with working with large data would be an asset.

LDheatmap Code Enhancements

Supervisor: Brad McNeney

LDheatmap is an R package for graphical display of association (linkage disequilibrium) between pairs of genetic markers on the same chromosome. In the 10 years since its publication on the Comprehensive R Archive Network (CRAN), LDheatmap has been cited in 139 research papers in fields such as agriculture and medicine. Development of the package has been slow in the past few years, and the list of feature requests from users is growing.  The NSERC USRA will be responsible for:

  1. Familiarizing themselves with the LDheatmap code,
  2. Compiling feature requests and providing a preliminary assessment of the feasibility of each one,
  3. Writing R code to implement new features,
  4. Documenting all work.

Goodness-of-fit with weighted data

Supervisor: Richard Lockhart

Physicists are often interested in comparing simulated data to real data to see if the two samples come from the same distribution.  Classically these comparisons are made by `binning' the data and using Pearson's chi-squared statistic. But there are many reasons for thinking this might not be a very sensitive method of comparison.  In particular tests based on empirical distribution methods might do better.

The USRA student would have the following responsibilities and be asked to do whichever ones turn out to be most practical:

  1. Literature review in the statistics literature for weighted tests of fit -- one sample and two sample.
  2. Literature in the physics literature for real examples of methods used by physicists.
  3. Development of R code for both chi-squared and empirical distribution tests and then general code for doing Monte Carlo studies of the performance.
  4. Evaluation of the properties of the various tests under realistic (as determined from the physics literature review) situations.
  5. Working with the Supervisor to describe the theoretical properties of these procedures.

Bayesian evaluation and interpretation of LASSO model fits

Supervisor: Richard Lockhart

The LASSO method of fitting linear models uses a so-called penalty. The penalty may be thought of as a prior distribution for the unknown regression co-efficients.  For a single fixed choice of the parameter in that prior distribution, the  posterior mode is then the usual LASSO estimate. It has the appealing property that many of the estimated slopes will be exactly zero if the penalty parameter has been picked correctly.

In this project we will investigate this Bayesian interpretation.  The USRA student will:

  1. Conduct a literature review finding papers which consider the penalty as a prior in a fully Bayesian analysis.
  2. Develop code in R to treat the penalty as a prior and do Bayesian analysis on this basis.
  3. Investigate the use of a hyperprior on the penalty parameter by repeating 1) with a variety of hyperpriors.
  4. Investigate methods proposed by the supervisor for doing `calibrated' Bayesian inference -- adjusting priors or loss functions to produce Bayesian procedures with specified coverage or Type I error rates.

 

If you are interested in applying, please follow the procedure below:

Note We thank all applicants for their interest but request that they refrain from contacting prospective supervisors during the selection process. Supervisors will contact applicants selected for an interview.

NSERC Award:

Canadian citizens and permanent residents should apply for the NSERC USRA only; please do not apply for  the  VPR USRA. If you are eligible for the NSERC USRA:

  • Go online at the NSERC site: www.nserc-crsng.gc.ca/Students-Etudiants/UG-PC/USRA- BRPC_eng.asp
  • Please submit the following: the printed NSERC Form 202 (Part I), with your NSERC Online Reference #; an up-to-date unofficial transcript (not an advising transcript); and, optionally, a short statement (<200 words) about yourself and your interest in the USRA programme. These must be submitted to the Grad Secretary in the Department of Statstics & Actuarial Science Room SC K10547 by January 21, 2019.  
  • Selected students will be notified by the department. These students must verify applications online by January 31, 2019. Note that this includes uploading transcripts and having a supervisor start Form 202 Part II.

VPR Award:

If you are only eligible for the VPR USRA:

  • Go to www.sfu.ca/dean-gradstudies/awards/undergraduate-awards/sciences-awards.html and complete the student portion of the application form.  In the section on Award Information, filling in the proposed supervisor is optional.
  • Once you have completed the student portion, print it and attach an up to date unofficial transcript (not an advising transcript) and, optionally, a short statement (<200 words) about yourself and your interest in the USRA programme. These must be submitted to the Grad Secretary in the Department of Statistics & Actuarial Science Room SC K10547 by January 21, 2019 
  • Selected students will be notified by the project Supervisors.
  • Supervisors will fill out the 'Supervisor Information' and 'Research Project' sections of the form by January 31, 2019.

USRA Recipients

2018 

Description Supervisor Student Semester Award Type
Alternatives to Gaussian Processes for Model Calibration D. Bingham R.Groenewald 1184 NSERC
Microbiome Research L. Wang Y.Gu 1184 NSERC
Improving the Hosmer-Lemeshow Goodness-of-Fit Test T. Loughin N. Surjanovic 1184 NSERC
Dealing with Statistical Challenges in Analysis of SFU-FLIP Study Data J. Hu P. Wei 1184 NSERC
Genotype Imputation for Statistical Analysis of Alzheimer's Disease J. Graham S. Liu 1184 VPR
LDheatmap code enhancements B. McNeney M. Reyers 1184 VPR
LDheatmap documentation revamp B. McNeney X. Yang 1184 VPR
Effect of Air Pollution on Public Health J. Cao B. Thind 1184 Big Key Data