Revvity Sites Globally

Select your location.

*e-commerce not available for this region.

Australia

Austria

Belgium

Brazil *

Canada

China *

Denmark

Finland

France

Germany

Hong Kong (China) *

India *

Ireland

Italy

Japan *

Luxembourg

Mexico *

Netherlands

Norway

Philippines *

Republic of Korea *

Singapore *

Spain

Sweden

Switzerland

Thailand *

United Kingdom

United States

Blog

NGS NGS Library Prep

Jul 4th 2025

4 min read

The critical role of hamming distance in NGS barcodes.

Help us improve your Revvity blog experience!

Feedback

Accurate sample identification in multiplex sequencing is critical. Imagine discovering a low-frequency cancer mutation, only to realize later that a barcode misassignment can create a false positive. Such scenarios underscore why proper barcode design, guided by Hamming distance, is indispensable. The integrity of the demultiplexing step hinges on the minimum number of nucleotide substitutions required to transform one index into another, the Hamming distance. A larger Hamming distance prevents single-base errors and most index-hopping events from being confused with legitimate reads, sharply reducing cross-sample contamination and safeguarding research outcomes. Hamming Distance: From Coding Theory to Molecular Barcodes

Richard Hamming’s work in the 1940s demonstrated that codes with greater Hamming distances (that is, text strings differing at more positions) allow detection and correction of more errors. This concept now is applied to many areas beyond computer sciences.

For instance, two 8-base pair indices, ACGTAGCT and ACGTGGCT, differ by a single nucleotide, giving them a Hamming distance of 1. Such proximity offers no protection against sequencing errors. Increasing the minimum distance to 4 ensures that multiple independent errors, an unlikely scenario at typical Illumina error rates (around 0.1%), would be required for misassignment. Understanding and applying this concept is crucial to maintaining the integrity of multiplex sequencing experiments.

Mechanisms of Index Misassignment

Misassignment predominantly arises from two sources:

Base-calling substitutions occur due to weak fluorescent signals or phasing drifts, and their frequency directly correlates with sequencing quality scores. Even slight declines in sequencing quality can significantly increase the likelihood of these errors.

Index hopping involves the physical transfer of adapter sequences between clusters, observed on patterned-flow-cell instruments such as Illumina’s NovaSeq® X and NextSeq® 2000.

Published studies report index hopping rates ranging from 0.1% to 2%, significant enough to impact sensitive assays like oncology or single-cell sequencing. This issue has prompted Illumina and other vendors to recommend unique dual indexing approaches as a standard practice, emphasizing the need for robust indexing strategies.

Quantitative Impact of Hamming Distance

Consider an 8-nt index with a substitution error rate of 0.1% (Q30 quality score). If the minimum Hamming distance is 4, the probability of a misassignment due to substitutions plummets dramatically to approximately 7 × 10⁻¹¹. At 50 million reads per library, that equates to fewer than one misassigned read, effectively negligible for most applications. Laboratories switching from combinatorial dual indices (distance ≈ 3) to Unique Dual Indices (UDIs; distance ≥ 4) report reductions in swapped reads by 10- to 20-fold, greatly minimizing artifacts and significantly enhancing data integrity. This quantitative benefit is especially crucial in contexts where precision and reproducibility are paramount.

Indexing Architectures Viewed Through a Distance Lens

Early single-index systems offered minimal error protection (distance ~2), making them vulnerable to single-base errors. Combinatorial dual-index systems increased protection, requiring simultaneous errors in two indices for misassignment, yet these still suffered from index-hopping issues due to shared indices. Because the number of unique indices grows exponentially with length, extending indices from 8 to 10 – 12 nucleotides dramatically expands sequence diversity, making it far easier to maintain a minimum Hamming distance of ≥ 4 across thousands of samples without reusing barcodes.

Unique Dual Indices (UDIs) present an optimal solution, assigning each sample a unique combination of two distinct indices, each carefully engineered with a minimum Hamming distance of ≥4. Any mismatch in either index results in automatic rejection of the read, effectively neutralizing index-hopping and significantly reducing misassignments. Laboratories adopting UDIs consistently report improved data quality, enhanced sensitivity, and reduced operational costs associated with fewer reruns.

Practical Evaluation of NGS Index Sets Laboratories that utilize commercially available index plates can perform a concise, three point inspection. First, check the vendor reported minimum Hamming distance. If that is not available, it can be calculated with DNABarcodes. Second, ensure your demultiplexing parameters are conservative: Illumina’s bcl convert defaults to one allowed mismatch, and raising that threshold should be justified only when your index set possesses a minimum distance greater than four. Third, track index read Q30 scores run to run. An unexpected dip almost always precedes a surge in cross talk, giving you time to troubleshoot before data integrity is compromised.

Hypothetical Impact of Hamming Distance: NGS Case Studies

Oncology Panel:

A research team analyzing oncology hotspot mutations initially experienced misassigned reads of approximately 0.25% using combinatorial dual indexing. This rate resulted in false-positive variant calls, complicating the analysis. Upon transitioning to UDIs with a minimum Hamming distance of ≥4, the misassignment rate dropped dramatically to below 0.02%. This improvement increased confidence in the results, streamlined variant-calling workflows, and supported accurate interpretations.

Single-cell RNA-seq:

A core facility performing single-cell RNA sequencing faced challenges with artificial doublets arising from barcode misassignments. Implementing UDIs substantially reduced these artificial doublets, enhancing the accuracy of downstream clustering and differential expression analysis. Laboratories report usable cell yields improving by up to 8%, translating to better statistical power and resource efficiency.

Conclusion

Hamming distance is a critical factor in multiplex sequencing, directly influencing data reliability and accuracy. Distance ≥3 minimally detects errors, ≥4 substantially reduces misassignment, and ≥6 virtually eliminates contamination-crucial for sensitive applications. Properly leveraging this principle transforms indices from mere labels into powerful error-correcting tools essential for biological research. As read depths and stakes rise, error-correcting indices are no longer optional, they are required.

Learn about NEXTFLEX™ Unique Dual Indices

NEXTFLEX™ Unique Dual Index plates deliver reliability for high-throughput sequencing, incorporating carefully optimized barcode designs, balanced base composition, and seamless integration with demultiplexing software defaults. Designed to meet the rigorous demands of sequencing laboratories, the NEXTFLEX indices provide robust protection against sequencing errors, index hopping, and misassignment.

Explore NEXTFLEX Unique Dual Indices

References

Costello M, Fleharty M, Abreu J, et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics. 2018;19:332. doi:10.1186/s12864-018-4703-0.
Van der Valk T, Vezzi F, Ormestad M, Götherström A, Guschanski K. Index hopping on the Illumina HiSeq X platform and its consequences for ancient DNA studies. Mol Ecol Resour. 2019;20(5):1171-1181. doi:10.1111/1755-0998.13009.
Heaton H, Talman AM, Knights A, et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nat Methods. 2020;17(6):615-620. doi:10.1038/s41592-020-0820-1.
Illumina, Inc. Index Hopping: What It Is, Why It Matters, and How to Minimize It. Technical Note (Pub. No. 770-2017-004). Illumina; 2017.
Available at: https://www.illumina.com/techniques/sequencing/ngs-library-prep/multiplexing/index-hopping.html. Accessed 10 June 2025.
Li Q, Zhao X, Zhang W, et al. Reliable multiplex sequencing with rare index mis-assignment on DNB-based NGS platform. BMC Genomics. 2019;20:215. doi:10.1186/s12864-019-5569-5.