
Website The Garvan Institute of Medical Research
Job Title: Garvan Summer Research Scholarships 2023
Location: Sydney
Time Type: Full time
Posted On: August 30, 2023
Job Requisition ID: PRF6944
THE OPPORTUNITY
The summer scholarship program offers currently enrolled undergraduate students the opportunity to carry out research projects during summer 2023/2024. The program will run for 8 weeks and we have 12 opportunities available. The scholarship will be up to a value of $5000, dependent on duration in the program ($625 per week).
These projects provide hands-on research experience across a range of topics:
Project 1: Cutting-edge DNA sequencing technology comparison
Many DNA sequencing technologies can be used to read a person’s genome, with varying accuracy, read-lengths, and epigenetic information. As these rival technologies compete to provide the best results for users, it can be time-consuming to keep track of the improvements and how they will impact various analyses and applications. Furthermore, independent testing of claims made by sequencing providers is essential to deduce the real impact from marketing hype.
The project aims to develop an automated evaluation framework to compare data from competing DNA sequencing technologies (Oxford Nanopore, Pacific Biosciences, Illumina, etc) in a number of different areas, as well as creating intuitive visualisations to present these comparisons. A breakdown of areas follows:
1. Base calling models for nanopore sequencing
2. Read quality, length and cost-effectiveness
3. Detection of DNA Methylation and other epigenetic modifications
4. Single nucleotide variant calling
5. Copy number variant calling
6. Structural variant calling
The project will involve working with data from the latest DNA sequencing technologies and associated bioinformatic software. The candidate will get hands-on experience in genomics data analysis, visualisation and interpretation, with a focus on usability for clinical applications.
Project 2: Sequencing Metadata Portal
The Garvan Sequencing Laboratory processes thousands of samples using state of the art applications to generate Whole Genome Sequencing Libraries, Whole Exome Libraries and Single Cell Transcriptomics libraries. These libraries are then sequenced on the Illumina Novaseq platform. Each sample that is sequenced is processed through the production bioinformatics QC pipeline to generate hundreds to QC metrics.
To assist the lab in managing the large amount of samples, a data visualisation portal is to be built to help visualise the data and capture any samples that may require the lab’s attention. The portal needs to be able to query hundreds of thousands of entries and return results in a quick manner. The second aspect to the project will involve optimising the storage of metadata in a database to enable the visualisation.
You will gain experience with cutting edge sequencing technologies and bioinformatics tools.
Project 3: Workflows for production bioinformatics ancillary analyses
The Production Bioinformatics group routinely runs bioinformatics analyses of whole genome sequencing (WGS) data. These are mostly in the form of analysis pipelines, each comprised of a number of steps such as quality control, alignment, variant calling, annotation and delivery. For convenience, portability and reproducibility these pipelines are implemented as workflows. There are a number of useful “add-on” bioinformatic WGS analyses, such as pharmacogenomics, mitochondrial variant calling, antigen typing, genotyping of “troublesome” genes like SMN1/SMN2, and detection of pathogenic repeat expansion alleles, that we use occasionally but which are not yet incorporated into our workflows. In this project you will use the nextflow workflow management system to develop workflows that allow us to easily run these ancillary analyses.
The project will provide an opportunity to become familiar with a range of bioinformatics programs, and with use of a high-performance computing environment.
Project 4: Development of a database and dashboard for samples metadata
The Cellular Genomics Platform requires a database for tracking the metadata associated with the samples processed through the facility. Information such as sample ID, library type, multiplexing indices, and species needs to be tracked for each sample, and the samples loaded onto each sequencing flow cell also needs to be recorded. Ideally, the database would also track whether a given sample is waiting to be analysed, in processing, successfully analysed or failed.
This project would involve creating a proof-of-concept database as well as a lightweight, web-based dashboard for creating, viewing, and updating records.
Project 5: Updating production bioinformatics workflows
This project aims to improve and update our existing bioinformatics pipelines by translating them from Nextflow’s DSL1 to DSL2 and migrating them from our on-premises cluster to external compute providers, such as the National Computational Infrastructure (NCI). This project will result in enhanced pipeline performance, improved maintainability, and increased scalability, allowing us to leverage the advanced capabilities of Nextflow DSL2 and world-class, high-end computing services. In this project you will be exposed to high performance computing environments (HPC), bioinformatic best practices set and the state-of-the-art workflow management tool Nextflow.
In addition the resulting pipeline will be benchmarked against the current workflows to ensure the validity of the results generated.
Project 6: Software tools for enhancing multi-gene panels in genomic research
Genome sequencing can be used to diagnose and study various diseases. A single change in a gene can cause a rare disease, but it can be difficult to find these changes among normal variation. Databases of disease-linked genes can help to prioritise the analysis, and this project aims to develop software to find more likely gene candidates.
Project 7: Developing a framework for RNA-seq analysis reporting
RNA-seq analysis can be structured in modules, including data pre-processing, quality assessment, unsupervised and supervised analysis. Common supervised analysis tasks involve comparing predefined sample groups using differential expression and gene set enrichment analysis to identify deregulated genes and pathways, respectively. The goal of this project is to develop a framework for generating RNA-seq analysis reports to support Garvan researchers with analysing and interpreting their RNA-seq data.
Project 8: Understanding genetic disease at single-cell resolution
Disease-causing genetic variants are either inherited (and present in every cell), or acquired (and present in only some cells). Our ability to understand the impact of genetic variation in disease has been transformed by a suite of single-cell technologies, the latest of which can detect full-length transcripts and acquired genetic variants using long-read sequencing. Using samples collected from patients with an acquired genetic disease (VEXAS), the aim of this project is to pilot a new long-read single-cell sequencing technology (PacBio MAS-seq), and compare it to data from short-read (Illumina) and long-read (Oxford Nanopore) platforms generated in parallel. This project would suit a motivated individual familiar with genomic data, with an interest in applying the latest genomic technologies to understand disease.
Project 9: Population-scale analysis of acquired genetic disease
As we age, our cells and tissues acquire and accumulate genetic mutations. In some cases this can lead to cancer, although there is an increasingly large number of other diseases (mostly immune and inflammatory) now recognised to be caused by acquired mutations. The emergence of population-scale genomic datasets provides a unique opportunity to discover such diseases, and better understand their clinical presentations and trajectories. This project will leverage genomic and clinical data from hundreds of thousands of individuals, from both public and private sources, to identify and characterise groups of individuals with acquired genetic variants. This project would suit a motivated individual familiar with genomic data, and an interest in working with large genomic and clinical datasets.
Project 10: Predicting enhancer-promoter interactions by deep learning
Genes are regulated by bringing regulatory elements, such as enhancers into close three-dimensional (3D) proximity and therefore assigning enhancers to target genes remains an important question in understanding gene regulation. CTCF binding is one of the most critical determinants of 3D genome organisation.
Recent technologies such as high-throughput chromosome-conformation capture (in situ Hi-C) allow for direct identification of enhance-promoter interactions in a cell-type specific manner. However, the availability of 3D genome organization data from experiments remains limited due to high costs involved.
Deep learning has emerged as a powerful approach for studying genomic features and reducing the need for experimental analyses of chromatin organization. These methods frequently utilize DNA sequence encoded motifs alone or in conjunction with cell-type specific genomic features (CTCF binding, chromatin structure). However, their predictive accuracy remains limited and rigorous evaluation is required for their future use.
This project will take advantage of matched Hi-C, CTCF binding, and chromatin accessibility data generated in our Laboratory (Head: Prof Susan Clark) to assess the performance of the top deep learning approaches for predicting enhancer-promoter interactions. This will form a base for integration into current and future studies of gene deregulation in cancer.
Project 11: Enhancing reproducibility and scalability in mass spec imaging data analysis through optimised workflow management.
Mass spectrometry imaging (MSI) is a state-of-the-art technique allowing spatial characterisation of proteins in tissues. We have developed an R-based analysis pipeline that enables researchers to interrogate MSI data to uncover novel biological insights in cancer.
In this project, the student will work on enhancing the utility of our pipeline through designing and implementing a comprehensive workflow management system. The existing code serves as the foundation, but lacks streamlined procedures, code modularisation, and error checking that are necessary for managing, reproducing and scaling complex analyses, crucial in the context of MSI data due to its intricate and voluminous nature.
The student will leverage their computer science expertise to construct a dynamic workflow framework that automates data pre-processing, computation and analysis, and visualization of MSI data, whilst also learning about the generation and interpretation of large biological ‘omics datasets and their application to the study and treatment of solid cancers.
Project 12: Technical and biological assessment of methylation at enhancer regions in prostate cancer
DNA methylation is one of the earliest molecular changes to occur in prostate cancer. A growing body of evidence suggests that DNA methylation at enhancers, regulatory regions of the genome, plays an important role in gene regulation in cancer. The new EPICv2 methylation array platform provides increased coverage of enhancer regions compared to older versions of the platform, including at prostate-specific enhancers. Our lab has generated some of the first EPICv2 data globally, including from prostate cell lines and prostate tumour tissue. In this project you will integrate our prostate EPICv2 methylation data with matched whole genome bisulfite sequencing data and public epigenetic datasets. The study aims are 1) technical – to assess the coverage and accuracy of methylation measurements at prostate enhancer regions, and 2) biological – to identify novel prostate cancer-associated methylation changes at enhancers.
ABOUT GARVAN
The Garvan Institute of Medical Research is an independent Medical Research Institute (MRI) in Sydney, delivering scientific and clinical impact on a global basis and in partnership with organisations that share our vision. We are proud to be one of Australia’s largest and most highly regarded MRI’s.
Our vision is global leadership in discoveries to impact and our enduring purpose is to impact human health, by harnessing information encoded in our genome.
We seek to see our world-class discovery research achieve life-changing impacts, not only for individual patients with rare diseases, but for the many thousands affected by complex, common disease.
Garvan promotes a diverse workplace and is committed to the principles of equity, diversity, inclusion and belonging.
HOW TO APPLY
All applications must be submitted via the Garvan Careers site [Workday].
Applications from other sites/channels will not be considered.
Part One:
Your application via the Garvan Careers Site/Workday should include:
- Copy of your CV/resume [no more than five (5) pages]
- Cover letter outlining which project(s) you are applying for [one page only]
- Copy of your academic transcript/s
[Note – Our system requires these documents to be compiled into one PDF document]
Part Two:
In addition to submitting your application via Workday, please complete the Student Applicant Form at: https://tinyurl.com/garvansummer23
CLOSING DATE
The position will remain open until filled. We will be reviewing applications as they are received, and so we encourage you to submit your application as soon as possible.
We aim to have positions filled by the end of October 2023 for project commencement in Mid-November 2023. All applicants will be notified of the outcome of their application by the end of October 2023.
Interested in applying for more such jobs? Click Here
To apply for this job please visit garvan.wd3.myworkdayjobs.com.