RNA sequencing has long complemented DNA sequencing in oncology by revealing how tumors express and regulate their genomes. Despite this, variant calling from RNA has historically been viewed with scepticism because of alignment complexity, variable expression, and RNA editing. As a result, RNA-Seq has been restricted mainly to expression profiling and fusion detection rather than mutation discovery.
A recent study by Bollas et al1 introduced VarRNA, an open-source pipeline that challenges this limitation. VarRNA integrates machine-learning classifiers within the established GATK RNA short-variant best practices framework1 to detect and classify RNA-Seq variants as either true or artifactual and further distinguish germline from somatic calls. By coupling this with databases such as OncoKB™ and REDIportal4, VarRNA turns transcriptomic data into a credible source of variant information relevant for clinical research.
Description of VarRNA pipeline
VarRNA pipeline assumes unbiased, full-length coverage across transcripts and sufficient read depth to support variant calling. Therefore, stranded, poly(A)-selected or rRNA-depleted libraries such as those generated by NEXTFLEX™ Rapid Directional RNA-Seq Kit 2.0 are compatible, while 3′-tag, 5′-capture, or single-cell workflows are not appropriate for VarRNA analysis.
VarRNA applies a two-stage machine-learning strategy to increase the accuracy of variant detection from RNA-Seq data. In the first stage, a classifier identifies and removes false variant calls caused by sequencing or alignment artifacts. The second classifier then separates germline variants, which are inherited, from somatic variants that arise in the tumor.
In comparison with traditional rule-based filtering, VarRNA’s machine-learning approach significantly reduces low-frequency artifacts and manual review burden. Earlier tools such as RNAIndel5 and DeepVariant RNA6 each address isolated components of the problem. VarRNA offers a unified, explainable framework explicitly designed to distinguish somatic and germline variants in tumor RNA-Seq data.
In the study VarRNA was trained using datasets with matched tumor and normal exome sequencing, which provided a high-confidence ground truth. When evaluated on pediatric and adult tumor cohorts, VarRNA achieved high precision and recall in filtering artifacts and strong accuracy in germline classification.
A particularly interesting result from the study is the consistent detection of allele-specific expression in pathogenic cancer variants. Roughly half of the exome-confirmed mutations identified in RNA-Seq showed marked allelic imbalance, with the mutant allele often dominating transcript expression. This trend, evident in genes such as MSH6, TP53, BRAF, and STAG2, illustrates that the transcriptional representation of a mutation may differ substantially from its DNA allele frequency. Such findings complement DNA-based profiling by identifying which variants are not only present but transcriptionally active, offering insight into their potential biological impact.
In translational research, this capability provides three main advantages. First, RNA evidence can confirm expression of pathogenic variants already detected in DNA-seq. Second, it can reveal expressed variants absent from exome data due to incomplete capture or shallow coverage. Third, it quantifies allele-specific expression, offering a functional dimension that can refine the interpretation of oncogenic drivers.
Despite its strengths, RNA-based variant detection remains sensitive to several biological and technical variables. VarRNA was evaluated across 60–150M aligned reads per sample and notes that is sensitive to tumor purity (the fraction of tumor cells in a sample relative to normal, non-tumor cells), with performance tending to increase as purity rises. For practical implementation, reports should therefore include RNA coverage metrics and probability-based confidence scores, helping users assess whether RNA evidence for a given variant is reliable or inconclusive.
Conclusion
VarRNA signals the entry into a new phase for RNA-Seq in oncology. What was once a risky and underused experimental curiosity has matured into an analytical feature capable of yielding clinically interpretable insights. VarRNA, bundled with RNA-Seq workflows that produce good coverage across the full transcript length, offers the blueprint for incorporating this capability into modern oncology workflows, strengthening confidence in molecular findings and enhancing the biological resolution of cancer genomics.
References:
- Bollas, A., et al. (2025). Variant calling from RNA-Seq data reveals allele-specific differential expression of pathogenic cancer variants. Commun Med. 5, 202. doi:10.1038/s43856-025-00901-y.
- DePristo, M., et al. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 43, 491–498. doi:10.1038/ng.806.
- Van der Auwera, G.A., et al. (2013), From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr Protoc Bioinform. 43: 11.10.1-11.10.33. doi:10.1002/0471250953.bi1110s43.
- D'Addabbo, P., et al. (2025). REDIportal: toward an integrated view of the A-to-I editing. Nucleic Acids Res. 53(D1):D233-D242. doi: 10.1093/nar/gkae1083.
- Hagiwara, K., et al. (2020). RNAIndel: discovering somatic coding indels from tumor RNA-Seq data. Bioinformatics. 36(5):1382-1390. doi: 10.1093/bioinformatics/btz753. Erratum in: Bioinformatics. 2020 Aug 15;36(14):4231. doi: 10.1093/bioinformatics/btaa247.
- Cook, D.E., et al. (2023). A deep-learning-based RNA-Seq germline variant caller, Bioinform Adv. Volume 3, Issue 1, vbad062. doi:10.1093/bioadv/vbad062.
For research use only. Not for use in diagnostic procedures.