In agrigenomics, low pass WGS (LP-WGS) and Skim Seq underpin cost efficient genotyping in plant and animal breeding. These methods offer cost-effective strategies for genotyping and genomic selection. However, the success of these approaches heavily relies on the accuracy of genotype imputation, which in turn depends on the selection of appropriate reference panels.
For a detailed comparison between LP-WGS and Skim-Seq, including their advantages, limitations, and suitable applications, please refer to our blog: Skim-seq vs. Low-Pass Whole Genome Sequencing (LP-WGS) in Agrigenomics: Choosing the Right Strategy.
Imputation is a statistical method used to infer missing genotype data, enhancing the resolution of genomic analyses without the need for high-coverage sequencing. In LP-WGS and Skim-Seq workflows, imputation enables researchers to predict unobserved genotypes based on observed data and a reference panel, thereby increasing the utility of low-coverage sequencing data.
To learn more about imputation analysis in next-generation sequencing for agrigenomics, visit our blog: The Logic Behind Imputation Analysis in Next-Generation Sequencing for Agrigenomics.
Reference Panel Selection
The accuracy of genotype imputation in agrigenomics is profoundly influenced by the choice of reference panel. Key factors to consider include:
- Genetic Diversity and Population Specificity: A reference panel should closely reflect the genetic background of the target population to maximize imputation accuracy. Panels that are well-matched in terms of ancestry and population structure are more likely to capture relevant haplotype patterns, especially for rare variants. Mismatches between the reference and study populations can lead to decreased imputation performance, particularly for low-frequency alleles. In agrigenomics, where regional adaptation and breeding history shape genetic variation, using region-specific breeding lines as reference panels has been shown to yield superior imputation results compared to generic datasets. This approach is particularly important for species with high genetic diversity or complex stratification, where mismatches between reference and study populations can impact imputation quality (see example below).
- Panel Size and Quality: Larger panels with high-quality genotype data enhance imputation accuracy by providing a comprehensive representation of genetic variation. However, the benefits plateau beyond a certain size, emphasizing the importance of quality over sheer quantity.
- Challenges in Agrigenomics: Non-model organisms often lack extensive reference panels. In such cases, constructing custom panels using local germplasm or region-specific breeding lines can outperform generic reference sets. Combining these with larger public panels can further enhance performance.
Case Study
Rice (Oryza sativa japonica) is a species with a complex breeding history and high genetic diversity. FastQ data obtained from different individuals of Presido cultivar was uploaded to the CURIO platform and analyzed for imputation using two publicly available reference panels: Global Oryza Sativa Reference Panel (GORP) and the Plant-Impute DB Project’s Reference Panel (PIRP). GORP is part of an initiative to catalog the genetic diversity of rice worldwide and includes extensive coverage of different rice ecotypes and land races. PIRP is designed to facilitate genotype imputation but is not covering such an extensive range of ecotypes.
We found that number of imputed SNV, concordance and Imputation quality score (IQS) were higher when using GORP in comparison with PIRP (Table 1).
GORP |
PIRP |
||||||
---|---|---|---|---|---|---|---|
Average Depth | SNV number | Imputed SNVs | Concordance | IQS | Imputed SNVs | Concordance | IQS |
4 | 1,014,042 | 5,231,433 | 96.92% | 0.874 | 4,897,277 | 92.65% | 0.782 |
2 | 603,490 | 5,231,433 | 96.94% | 0.874 | 4,897,277 | 92.21% | 0.767 |
1 | 292,209 | 5,231,433 | 96.64% | 0.863 | 4,897,277 | 91.57% | 0.747 |
0.5 | 132,245 | 5,231,433 | 96.60% | 0.861 | 4,897,277 | 91.18% | 0.734 |
Table 1. Results of imputation analysis of one rice sample at different coverage depths using Beagle and two reference panels publicly available
Strategies for Selecting an Appropriate Reference Panel
To optimize imputation accuracy in agrigenomics studies, consider the following strategies:
-
Assess Population Structure:
Utilize tools like Principal Component Analysis (PCA) to understand genetic relationships and ensure the reference panel aligns with the study population's genetic makeup. This alignment is crucial, as mismatches can lead to reduced imputation accuracy.
-
Evaluate Panel Composition:
Incorporate individuals from genetically diverse backgrounds to ensure the reference panel captures a broad range of haplotypes. Striking a balance between genetic diversity and relevance to the target population is key—too much divergence can reduce accuracy, while too little may fail to represent all variant types present in the study group.
-
Implement Quality Control Measures:
Ensure high-quality genotype data in the reference panel to prevent the propagation of errors during imputation. This includes filtering out low-quality variants and verifying sample integrity.
-
Leverage Publicly Available Panels:
Utilize existing reference panels when available, assessing their relevance to the genetic background of the study population. Supplementing these panels with additional samples that closely match the target population can improve imputation accuracy, particularly for rare and population-specific variants.
-
Balance Sequencing Depth and Sample Size:
When working within budget constraints, it is often more effective to sequence a larger number of samples at lower coverage rather than fewer samples at high depth. Sequencing more individuals at low depth captures a wider spectrum of haplotypes, often improving imputation accuracy for low‑frequency variants.
-
Choose Appropriate Imputation Algorithms:
Choose an imputation engine that is validated for your species, matches your reference‑panel format, and scales to your sample number. Curio Genomics offers the Curio Genomics, Revvity provides comprehensive bioinformatics solutions, including rapid, cloud-based imputation analysis tailored for plant and animal breeding programs. Curio's platform offers a user-friendly interface for data interpretation and decision-making, facilitating the integration of sequencing data into breeding strategies.
Integrating Revvity's Solutions into Your Workflow
Revvity offers the NEXTFLEX™ HT Agrigenomics Low-Pass WGS Kit
NEXTFLEX HT Agrigenomics Low-pass WGS Kit
Discover
, a streamlined solution for preparing DNA libraries suitable for LP-WGS and Skim-Seq applications. This kit is compatible with Illumina® and Element® platforms and is designed for a wide range of sample qualities and quantities, making it ideal for high-throughput projects.
In partnership with Curio Genomics, Revvity provides comprehensive bioinformatics solutions, including rapid, cloud-based imputation analysis tailored for plant and animal breeding programs. Curio's platform offers a user-friendly interface for data interpretation and decision-making, facilitating the integration of sequencing data into breeding strategies.
Conclusion
Selecting suitable reference panels is crucial for maximizing the benefits of LP-WGS and Skim-Seq in agrigenomics. By carefully considering factors such as genetic diversity, population specificity, and panel quality, researchers can enhance imputation accuracy and, consequently, the effectiveness of breeding programs. Ongoing developments in reference panel construction and imputation algorithms will continue to improve the precision and utility of genomic selection strategies.
References:
- Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J., & Delaneau, O. (2021). Efficient phasing and imputation of low coverage sequencing data using large reference panels. Nature Genetics, 53(1), 120–126. https://doi.org/10.1038/s41588 020 00756 0
- Wragg, D., Zhang, W., Peterson, S., Yerramilli, M., Mellanby, R., & Schoenebeck, J. J. (2024). A cautionary tale of low pass sequencing and imputation with respect to haplotype accuracy. Genetics Selection Evolution, 56, 6. https://doi.org/10.1186/s12711 024 00875 w
- Heidaritabar, M., Huisman, A., Krivushin, K., Stothard, P., Dervishi, E., Charagu, P., … Plastow, G. S. (2022). Imputation to whole genome sequence and its use in genome wide association studies for pork colour traits in crossbred and purebred pigs. Frontiers in Genetics, 13, 1022681. https://doi.org/10.3389/fgene.2022.1022681
- González Recio, O., López Catalina, A., Peiró Pastor, R., Nieto Valle, A., & Fernández, A. (2023). Evaluating genotype by low pass nanopore sequencing for genomic prediction in dairy cattle. Journal of Animal Science and Biotechnology, 14, 98. https://doi.org/10.1186/s40104 023 00896 3
- Sánchez Roncancio, C., García, B., Gallardo Hidalgo, J., & Yáñez, J. M. (2023). GWAS on imputed whole genome sequence variants reveal genes associated with resistance to Piscirickettsia salmonis in rainbow trout (Oncorhynchus mykiss). Genes, 14(1), 114. https://doi.org/10.3390/genes14010114