Revvity Sites Globally

Select your location.

*e-commerce not available for this region.

Australia

Austria

Belgium

Brazil *

Canada

China *

Denmark

Finland

France

Germany

Hong Kong (China) *

India *

Ireland

Italy

Japan *

Luxembourg

Mexico *

Netherlands

Norway

Philippines *

Republic of Korea *

Singapore *

Spain

Sweden

Switzerland

Thailand *

United Kingdom

United States

Blog

NGS

Jul 25th 2025

4 min read

Fragmentation bias in NGS, and why this matters.

Help us improve your Revvity blog experience!

Feedback

Why talk about fragmentation?

Before any read is generated on an Illumina® or Element Biosciences™ AVITI short-read platforms, genomic DNA must be cleaved into fragments with a size in the range 150 – 600 bp. The ideal scenario is one where breaks approach true molecular randomness, meaning that every phosphodiester bond along a population of DNA molecules has an equal and independent probability of being cleaved during fragmentation. Molecular randomness is a shorthand for absence of systematic cleavage bias.

When this happens, the breakpoints will conform to a Poisson distribution, where break frequency will show no correlation with GC%, methylation status, or repetitive motifs¹. Adjacent breaks will be uncorrelated, meaning that knowing where one fragment starts will tell nothing about where the next one will occur. And finally, when mapped back, breakpoint density will be flat across chromosomes, mimicking a random walk².

Random shearing maximizes library complexity, so duplicate collapse removes only PCR copies, not alternative alleles. When fragmentation approaches this ideal, it also drives coverage uniformity. Downstream coverage depth will approximate a Poisson distribution, so 95% of bases will fall within ~2-fold of the mean depth under high-quality, PCR-free conditions at ≥30× mean depth. Uniform coverage, in turn, lowers the coefficient of variation for variant allele fractions, improving sensitivity for low-frequency SNVs and indels, and reducing false CNV breakpoints that arise when coverage dips are mistaken for deletions³. For de-novo assembly, random fragmentation minimises “fragile sites” where read clouds abruptly terminate, yielding longer contigs and higher NGA50 metrics⁴.

In other words, fragmentation is a determinant of data quality. The more stochastic the breakage, the more even and reliable every downstream analysis will be. In this blog we discuss the three main methods currently use to fragment DNA, focusing on the biases associated to each of them.

Mechanical fragmentation

Mechanical (acoustic or hydrodynamic) shearing comes closest to that ideal. Focused-ultrasonication disperses mechanical stress almost uniformly along the duplex backbone, yielding tight insert-size distributions and the flattest coverage of any method. Mechanical fragmentation generates tight size distributions when sonication energy is carefully calibrated and can be used for short-read and long-read applications.

Sequence-dependent breakpoints still exist at sub-kilobase level (for example, ultrasonic cavitation shows mild preferences for GC-rich tetranucleotides), but these artefacts are small compared with the context bias seen in other workflows. The trade-off is hardware cost, heat/oxidative damage at high energy settings can elevate C > A artefacts, and relatively modest throughput compared other approaches^5,6.

Mechanically sheared libraries such as those generated using the NEXTFLEX™ Rapid DNA-Seq Kit 2.0 NEXTFLEX Rapid XP V2 DNA-Seq Kit Discover and the NEXTFLEX Cell Free DNA-Seq Library Prep Kit 2.0 NEXTFLEX Cell Free DNA-Seq Library Prep Kit 2.0 Discover usually deliver the flattest coverage bed, crucial for copy-number calling or de-novo assembly.

Endonuclease fragmentation

Endonuclease fragmentation such as the solution incorporated into the NEXTFLEX Rapid XP V2 DNA-Seq Kit NEXTFLEX Rapid XP V2 DNA-Seq Kit Discover and the NEXTFLEX HT Agrigenomics Low-pass WGS Kit NEXTFLEX HT Agrigenomics Low-pass WGS Kit Discover incorporates the fragmentation, end-repair and A-tailing steps into a single tube, eliminating need of capital equipment and significantly increasing yield recovery from low inputs (<100 ng) compared to sonication, all while increasing throughput. This approach also has the advantage that fragment lengths are easily tunable by varying time or concentration.

Modern enzymes produce data approaching the randomness of mechanical shearing for whole genome sequencing (WGS). Nevertheless, sequence dependence arises from residual motif preferences of nucleases. For example, in some workflows it has been shown that AT-rich sites are cleaved less efficiently when reactions are over-digested⁷.And some kits exhibit GC skew at the extremes (<25% or >70% GC), requiring PCR-free protocols or spike-in standards to correct⁷.

Tagmentation (transposase-mediated "cut & tag")

Transposase-mediated tagmentation trades some uniformity for speed. Tn5 and MuA transposomes integrate adapters while cleaving DNA, producing ready to sequence libraries in under 30 minutes hands on time. Their palindromic 9-bp target consensus creates cosine-like coverage patterns that inflate apparent copy-number oscillations and under-represent AT-rich islands, biases that persist even after PCR normalisation^8-11. Fragment length control is coarser than mechanical and endonuclease fragmentation methods; over-fragmentation (<200 bp) is common if enzyme is overdosed. Engineered bead-linked Tn5 has reduced, but not eliminated, GC bias.

When speed is more important than absolute uniformity, tagmentation should be the method of choice. However, for quantitative WGS or high-confidence variant calling in GC-extremes, mechanical or optimised enzymatic fragmentation remains safer.

Conclusions

No single fragmentation chemistry is universally optimal. In practice, the choice of fragmentation should be guided by biological and operational priorities. Mechanical shearing remains the gold standard for quantitative WGS or any assay that demands maximal uniformity across the entire GC spectrum. Endonuclease fragmentation offers a pragmatic middle ground when capital investment or sample throughput demands dominate, provided reaction kinetics are optimised for the target insert range. Tagmentation excels when time-to-data is the most important consideration, but its sequence biases necessitate cautious interpretation of depth-dependent metrics.

Learn more about Revvity’s solution

References:

Ross, M.G., et al. (2013). Characterizing and measuring bias in sequence data. Genome Biol.14:R51.
Lander, E.S, Waterman, M.S. (1988). Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics, 2(3), 231-239.
Poptsova, M. S., et al. (2014). Non-random DNA fragmentation in next-generation sequencing. Sci. Rep. 4:4532.
Daviso, E. (2025). Optimizing germline testing: the importance of high-quality sample prep. Covaris blog.
Costello, M., et al. (2013). Discovery and characterization of artifactual mutations in deep-coverage targeted-capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41 (6): e67, 2013.
Chen, H., Zhang, et al. (2024). Characterization and mitigation of artifacts derived from NGS library preparation due to structure-specific sequences in the human genome. BMC Genomics 25:227.
Ribarska, T., et al. (2022) Optimization of enzymatic fragmentation is crucial to maximize genome coverage: a comparison of library preparation methods for Illumina sequencing. BMC Genomics. 23:92.
Adey, A., et al. (2010). Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biology. 11(12):R119.
Picelli, S., et al. (2014). Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Research. 24(12):2033-2040.
Gunasekera, S., et al. (2021). Evaluating coverage bias in next-generation sequencing of E. coli. PLoS One. 16:e0253440.
Segerman, B., et al. (2022). The efficiency of Nextera XT tagmentation depends on G and C bases in the binding motif leading to uneven coverage and overestimation of relative abundance in metagenomic sequencing. Front. Microbiol.13:944770.