Levering whole genome sequencing data to tackle missing heritability in rare eye diseases
Genome sequencing, a sneak peek into our biological blueprint
Dr. Ir. Mattias Van Heetvelde is a postdoctoral researcher and bioinformatician at the Center for Medical Genetics Ghent (CMGG). Within the BeSolveRD project [1], he focuses on omics [2] analyses in the context of rare eye diseases (RED) in the lab of Prof. Elfride De Baere [3].
Two important hereditary RED, for example, are Retinitis Pigmentosa (RP) and Leber Congenital Amaurosis (LCA). These two diseases are caused by harmful changes in a wide range of genes. For LCA, the age of onset is early, and children with this condition are born with poor vision or are often even blind. This disease occurs in 2-3 births per 100,000. RP is much more common and occurs in 1 in 3,000 to 4,000 people. RP begins with a phase of night blindness (often around puberty), after which peripheral vision gradually deteriorates. At a later age, patients have only central vision (tunnel vision) or very poor overall vision.
Both RP and LCA are genetically heterogeneous diseases, often involving genetic changes in multiple possible genes. Understanding the genetic causes of these diseases is crucial for developing effective treatments, especially as gene therapy becomes a viable option for many patients. However, identifying the specific genetic mutations responsible for these diseases can be challenging due to the wide range of genes involved. This is where advanced techniques like genome sequencing come into play, offering a more comprehensive view of the genetic landscape and increasing the chances of finding the underlying cause of these conditions.
A genome is the complete set of DNA sequences in an organism and contains all of the instructions required for that organism to function [4]. Think of our DNA as a string of letters that are arranged in a specific order. Genome sequencing is the process of figuring out the exact order of these letters in the DNA strand. Once scientists have the sequence, they can begin to understand what it means. For example, certain sequences can tell you about traits such as eye colour and height but also predisposition to certain diseases.
The current standard technique of Whole-Exome Sequencing (WES) only looks at the parts of DNA that provide instructions for making proteins (the coding part of the genome). Unfortunately, this method leaves half the patients with rare eye diseases (RED) molecularly undiagnosed.
This can be frustrating and problematic for patients' treatment. A definitive diagnosis enables appropriate therapy and follow-up, especially as gene therapy continues to advance and become more widely available in clinical trials.
Mattias Van Heetvelde: “For RED, several gene therapies are already are under development. These therapies are often specifically targeted at certain genetic defects, making a precise diagnosis essential. Without this, patients may be excluded from potentially life-changing treatments. Additionally, early intervention is critical; the sooner therapy begins, the greater the chance of preserving vision. Neural tissue, such as photoreceptors in the eye, does not regenerate, so any loss of vision is permanent.”
Mattias Van Heetvelde: “Also, from a family planning perspective, having a definitive molecular diagnosis is crucial. This allows for the selection of embryos for implantation in women who wish to conceive, keeping relevant genetic defects in mind. By screening each embryo for the presence of the defect, only those without the defect are implanted, which almost entirely eliminates the risk of passing on the genetic condition."
Leveraging Whole-Genome Sequencing for Advanced Diagnostics
Humans have two copies of each gene: one inherited from each parent. If a patient has a defective gene from one parent but an intact other gene from the other parent, the gene without defect might still function correctly. This functioning gene could potentially compensate for the defective one, theoretically preventing the disease.
Some patients still exhibit symptoms of RED, despite finding only one defective gene while the other gene seems intact. This suggests that the disease's cause in the gene copy from the second parent might be more complex than just a defect in the gene regions that WES typically examines.
For these cases, Whole-Genome Sequencing (WGS) for these patients could be a solution. Unlike WES, WGS examines the entire genome, including the ‘dark matter’ of the genome or non-coding regions (areas that do not make proteins but may regulate other genes or have other important functions).
Mattias Van Heetvelde: “Only ~1% of the genome is protein-coding. The rest is all non-coding. For a long time, the non-coding region was considered not to be important, but actually, we know it regulates the coding part. Therefore, it is useful to look at the whole genome to possibly find the genetic causes of the diseases that WES missed. This way, we hope to find additional genetic factors in these non-coding regions that could be contributing to the disease. A better understanding of the genetic causes can help us to better diagnosis and treatment options.”
Within the BeSolveRD project, several Belgian genetic centres, including the Center for Medical Genetics Ghent (CMGG), engaged in a multicentric, prospective, randomised control trial with the aim of technically validating WGS for clinic use. The study also includes a retrospective arm, the focus of dr. Miriam Bauwens’ work, where a hundred and fifty ‘half-solved’ RED cases, mostly with a mono-allelic (likely) pathogenic variant previously identified with WES, are investigated with WGS in the hope of providing a definite molecular diagnosis. If WGS proves successful, it could lead to more accurate diagnoses for patients with RED.
Supercomputing to overcome obstacles of WGS Implementation
However, the amount of data resulting from WGS experiments comes with substantial memory, computational power, and storage requirements. That is why CMGG turns to the Tier-2 infrastructure of Vlaams Supercomputer Centrum (VSC) for this project.
Mattias Van Heetvelde: “The average genome produces 80 Gb of raw data, which becomes 30 Gb of data after analysis (excluding intermediate files). The average laptop would only be able to hold two genomes at a time. Moreover, analysing one batch of 18 genomes requires about >5.000 CPU hours, which is >625 real-time hours or close to a month using an 8-core laptop. By using the VSC Tier-2 HPC infrastructure, we can execute this analysis in under 3 days (queuing time included) by dividing the necessary work into well-defined tasks that are executed in parallel on different nodes and, more efficiently, by allocating the necessary resources to each task.”
As the requirements for WGS analyses are much more challenging than routine WES analyses, running WGS analyses on local infrastructure would interfere with routine diagnostic workflows and, therefore, be inefficient. Using cloud services could provide a solution. However, sequencing data is subject to privacy regulations, so it is very sensitive data to store or analyse with commercial services. Also, the cost of storage and analysis on the cloud runs high.
Mattias Van Heetvelde: “The Tier-2 infrastructure from the VSC allows us the use of significant (temporary) storage and compute resources at no extra cost and ensures our data is handled securely and efficiently. As a safeguard, all patient-derived sequencing data on the VSC infrastructure are pseudonymised, meaning the key to retracing the sequencing data to patient information is not kept on VSC infrastructure.”
Advantages and Potential of Whole-Genome Sequencing in Medical Diagnosis
With WGS, it is possible to conduct more complex analyses, and the diagnostic yield for RED will increase. Also, offering a uniform test (‘one size fits all’) leads to higher efficiency. Now WES is done in the first line, followed by WGS if nothing was found or if patients are ‘half-solved’. If a patient's condition can be explained by changes outside the coding part, it becomes possible to find out with the same test.
Future of WGS in diagnostic routine
Mattias Van Heetvelde acknowledges that while the integration of WGS into routine medical diagnostics holds promise, it brings significant challenges. One of the primary obstacles is the interpretation of the vast amount of data generated by WGS, especially for non-coding regions of the genome, where much is still unknown. Although there is optimism, as technology advances and more data are gathered, it will take time to build the necessary databases and perform synchronized multi-omics and functional studies to understand these changes.
Additionally, infrastructure and storage capacity are significant concerns, as current systems are primarily optimised for WES. Upscaling to accommodate WGS would require substantial investment and careful consideration of data privacy regulations. Researchers, physicians, and informed patients are eager to see WGS implemented in clinical practice, although practical and ethical issues, such as how to handle uncertain findings, need to be considered as it becomes a routine diagnostic tool.
[1] https://beshg.be/workgroups/besolverd
[2] Omics: In biology the word omics refers to the sum of constituents within a cell. The omics sciences share the overarching aim of identifying, describing, and quantifying the biomolecules and molecular processes that contribute to the form and function of cells and tissues. (Source: Brittanica). Example: just like economics is the study of all aspects related to production, consumption and trade, so is genomics the study of all aspects related to the genome.
[3] https://www.debaerelab.com/
[4] https://institute.global/insights/public-services/what-genomic-sequencing-and-why-does-it-matter-future-health
Mattias Van Heetvelde
Dr. Ir. Mattias Van Heetvelde is a postdoctoral researcher and bioinformatician at the Center for Medical Genetics in Ghent. His current work mainly focuses on omics analyses in the context of rare (eye) diseases in the lab of Prof. De Baere. He is a Bioscience Engineer by training and holds a PhD in Health Sciences (2019, Ghent University), in which he researched the attenuation of BRCA1 and BRCA2 expression by small non-coding RNAs and their potential for synthetic lethality.