The Undergraduate Student Research Awards (USRA) programs expose undergraduate students to research, with the goal of encouraging students to pursue graduate studies leading to research careers. Previous students have been able to link their USRA to a co-op work term; please contact the Science Co-op office for more information on this possibility. If you are assigned to a project and would like to link the USRA to a co-op work term, please check first with your supervisor. The supervisor's co-op participation is limited to a short midterm check-in with the co-op program of about 15 minutes and a short online evaluation form at the end of the work term. Students requiring further feedback on or assistance with the co-op report or paperwork should consult the co-op office.
Science students have access to two programs, the Natural Sciences and Engineering Research Council (NSERC) USRA program and the SFU Vice President, Research (VPR) USRA program. NSERC USRAs are restricted to Canadian citizens and permanent residents; VPR USRAs are open to international students. Canadian citizens and permanent residents should apply to the NSERC program only. For more information, please see the SFU Dean of Graduate Studies USRA website.
The Department of Statistics and Actuarial Science is hoping to appoint up to eight students during the Summer of 2019. A list of proposed projects and information on how to apply is given below. We are asking for applications by January 21 so that supervisors have enough time to interview applicants and complete the NSERC/university nomination forms by the January 31 deadline.
Fast Compression and Data Structures for Genetics
Supervisor: Lloyd Elliott
In the past decade, new consortia for genomics involving half a million or more subjects have become available to researchers. The scale of these consortia greatly surpasses all previous studies. Advanced software such as bgenie and plink2 are designed to discover genome-wide associations in this context at speed. However, some inefficiencies in the file formats used by advanced genetics software are amplified by the scale of these consortia, and improvements to these file formats could greatly reduce the cost of these studies.
In this project, the student will research methods to improve random access and compression of genetic file formats. This will involve developing pre-seeded dictionaries for compressions such as zlib and Zstandard. The student will also design new database formats for compressed matrices that allow random access on both the rows and the columns of the matrix. Experience with C/C++ or compression specifications or genetic file formats such as bed or bgen.
Modelling Economies of Scale in Pension Funds
Supervisors: Jean-François Bégin & Barbara Sanders
Plan mergers have significantly reshaped the pension landscape in several European countries over the past decade and are now emerging as a possible trend in Canada as well. One obvious result of a pension plan merger is a larger fund size, which may lead to economies of scale in pension administration and investment. Empirical studies show that small-sized funds tend to have larger costs and vice-versa (Bikker and de Dreu, 2009), and that larger funds have more negotiation power in investment and can spread their fixed costs across a larger number of members (Bikker et al., 2012). On the other hand, larger plans may be prone to higher costs. For example, the fund may be too large in relation to the number of high-quality investment opportunities available, which negatively impacts returns. Nonetheless, most studies conclude that the advantages of larger size outweigh the disadvantages (Bikker and de Dreu, 2009; Dyck and Pomorski, 2011).
Following these stylized facts, this USRA project (May 2019 to August 2019) aims to put forward a model for administrative and investments costs in pension plans. Specifically, the student will be responsible for:
- Familiarizing themselves with the current literature on economies of scale.
- Developing a mathematical model.
- Writing code to implement the model.
- Documenting all work.
This summer project is part of a larger research program on pension mergers. The model developed by the student will be an integral piece of a framework used to quantify the welfare impact of bringing together different pension plans.
Intergenerational Risk Sharing in Funded Pension Plans: A Game-Theoretic Approach
Supervisors: Barbara Sanders & Jean-François Bégin
Most funded pension plans rely on intergenerational risk sharing to create stable retirement income: when deficits arise, generations can provide subsidies instead of reducing pension benefits. Studies have shown that this type of cooperation is beneficial to all participants because reducing uncertainty in retirement increases the expected utility of members’ consumption over their lifetime. Yet, this cooperation has limits: for instance, the younger generation might be unwilling to subsidize the older generation when the deficit is too large. When this limit is exceeded, tensions between generations might lead to the demise of the pension plan.
To address the threat of non-cooperation, game theory can be used to design self-enforceable pension contracts, i.e., ones that reduce tension by taking into account each generation’s self-interest. Recently, Wang (2018) explored the threshold above which cooperation in funded pension plans should not be enforced. In her model, the younger generation’s cost of cooperation is capped from above but not from below: the young can grasp all the upside potential from the old but only bear the downside risk up to a certain limit. In addition, cooperation is an all-or-nothing deal in Wang’s framework: either there is full cooperation or there is none.
This USRA project (May 2019 to August 2019) aims to put forward two generalizations of Wang (2018): applying a lower bound to the cost of cooperation for a more equitable treatment of surpluses, and exploring partial cooperation. Specifically, the student will be responsible for:
- Familiarizing themselves with the current literature on game theory, especially with respect to applications to pension funds.
- Extending Wang’s (2018) game-theoretic framework as described above.
- Writing code to implement the model.
- Documenting all work.
Seeding the simRVsequences R package
Supervisor: Jinko Graham
Identifying DNA variants that cause increased susceptibility to disease is of great scientific and clinical interest. One strategy for identifying causal rare DNA variants is to study families with more than their fair share of the disease. However, accumulating a large enough sample of well-characterized families for a family-based study is time-consuming and expensive.
To assist with study planning and inference, in collaboration with the BC Cancer Agency's Lymphoid Cancer Families Study, we are developing simRVsequences, an R package to simulate DNA sequence data in families enriched for disease-affected relatives. Families can be simulated dynamically, allowing for birth, disease-onset, and death at the individual level. At the family level, users can model complex sampling criteria for a family to enter the study. The package is unique in allowing users to simulate the underlying genetic cause and exome sequences in disease-enriched families.
This project will involve gathering, processing and assembling seed data for simRVsequences using publicly-available resources for human genetic variation such as the 1000 Genomes Project. The student will be part of a team working on package development. Experience with R and with working with large data would be an asset.
LDheatmap Code Enhancements
Supervisor: Brad McNeney
LDheatmap is an R package for graphical display of association (linkage disequilibrium) between pairs of genetic markers on the same chromosome. In the 10 years since its publication on the Comprehensive R Archive Network (CRAN), LDheatmap has been cited in 139 research papers in fields such as agriculture and medicine. Development of the package has been slow in the past few years, and the list of feature requests from users is growing. The NSERC USRA will be responsible for:
- Familiarizing themselves with the LDheatmap code,
- Compiling feature requests and providing a preliminary assessment of the feasibility of each one,
- Writing R code to implement new features,
- Documenting all work.
Goodness-of-fit with weighted data
Supervisor: Richard Lockhart
Physicists are often interested in comparing simulated data to real data to see if the two samples come from the same distribution. Classically these comparisons are made by `binning' the data and using Pearson's chi-squared statistic. But there are many reasons for thinking this might not be a very sensitive method of comparison. In particular tests based on empirical distribution methods might do better.
The USRA student would have the following responsibilities and be asked to do whichever ones turn out to be most practical:
- Literature review in the statistics literature for weighted tests of fit -- one sample and two sample.
- Literature in the physics literature for real examples of methods used by physicists.
- Development of R code for both chi-squared and empirical distribution tests and then general code for doing Monte Carlo studies of the performance.
- Evaluation of the properties of the various tests under realistic (as determined from the physics literature review) situations.
- Working with the Supervisor to describe the theoretical properties of these procedures.
Bayesian evaluation and interpretation of LASSO model fits
Supervisor: Richard Lockhart
The LASSO method of fitting linear models uses a so-called penalty. The penalty may be thought of as a prior distribution for the unknown regression co-efficients. For a single fixed choice of the parameter in that prior distribution, the posterior mode is then the usual LASSO estimate. It has the appealing property that many of the estimated slopes will be exactly zero if the penalty parameter has been picked correctly.
In this project we will investigate this Bayesian interpretation. The USRA student will:
- Conduct a literature review finding papers which consider the penalty as a prior in a fully Bayesian analysis.
- Develop code in R to treat the penalty as a prior and do Bayesian analysis on this basis.
- Investigate the use of a hyperprior on the penalty parameter by repeating 1) with a variety of hyperpriors.
- Investigate methods proposed by the supervisor for doing `calibrated' Bayesian inference -- adjusting priors or loss functions to produce Bayesian procedures with specified coverage or Type I error rates.
Inference Conditional on a Tuning Parameter Selected by Cross Validation
Supervisor: Richard Lockhart
There are now many penalized methods for fitting linear models; examples include ridge regression, LASSO, and SCAD. These methods have a so-called tuning parameter which is often selected by cross-validation. Recent work by the supervisor and co-authors has led to methods for conditional inference after model selection for a fixed value of the tuning parameter. The goal of this project is to investigate using these methods of conditional inference when cross-validation is used to pick the tuning parameter. The penalty may be thought of as a prior distribution for the unknown regression co-efficients.
In this project we will investigate numerically this idea for conditional inference and try to draw conclusions about the usefulness of the tactic. The USRA student will:
1) Conduct a literature review finding papers which describe such penalized methods and seek out those with implementations in R or another high level language for statistical research and analysis.
2) Develop code in R to implement the conditional inference procedures in question.
3) Investigate the limits on the size of models and data sets which can comfortably be handled in this way.
If you are interested in applying, please follow the procedure below:
Note: We thank all applicants for their interest but request that they refrain from contacting prospective supervisors during the selection process. Supervisors will contact applicants selected for an interview.
Canadian citizens and permanent residents should apply for the NSERC USRA only; please do not apply for the VPR USRA. If you are eligible for the NSERC USRA:
- Go online at the NSERC site: www.nserc-crsng.gc.ca/Students-Etudiants/UG-PC/USRA- BRPC_eng.asp.
- Please submit the following: the printed NSERC Form 202 (Part I), with your NSERC Online Reference #; an up-to-date unofficial transcript (not an advising transcript); and, optionally, a short statement (<200 words) about yourself and your interest in the USRA programme. These must be submitted to the Grad Secretary in the Department of Statstics & Actuarial Science Room SC K10547 by January 21, 2019.
- Selected students will be notified by the department. These students must verify applications online by January 31, 2019. Note that this includes uploading transcripts and having a supervisor start Form 202 Part II.
If you are only eligible for the VPR USRA:
- Go to www.sfu.ca/dean-gradstudies/awards/undergraduate-awards/sciences-awards.html and complete the student portion of the application form. In the section on Award Information, filling in the proposed supervisor is optional.
- Once you have completed the student portion, print it and attach an up to date unofficial transcript (not an advising transcript) and, optionally, a short statement (<200 words) about yourself and your interest in the USRA programme. These must be submitted to the Grad Secretary in the Department of Statistics & Actuarial Science Room SC K10547 by January 21, 2019
- Selected students will be notified by the project Supervisors.
- Supervisors will fill out the 'Supervisor Information' and 'Research Project' sections of the form by January 31, 2019.
|Fast Compression and Data Structures for Genetics||L. Elliott||W. Chen||1194||NSERC|
|Modelling Economies of Scale in Pension Funds||J.-F. Bégin||Q.R. Li||1194||NSERC|
|Intergenerational Risk Sharing in Funded Pension Plans: A Game-Theoretic Approach||B. Sanders||F. E. Chen||1194||VPR|
|Seeding the simRVsequences R package||J. Graham||W. W. Wang||1194||NSERC|
|LDheatmap code enhancements||B. McNeney||Y. Yan||1194||VPR|
|Goodness-of-fit with weighted data||R. Lockhart||L. Yao||1194||VPR|
|Bayesian evaluation and interpretation of LASSO model fits||R. Lockhart||X. T. Zhang||1194||VPR|
|Inference Conditional on a Tuning Parameter Selected by Cross Validation||R. Lockhart||J. C. Liu||1194||VPR