Synthetic solutions to long-read problems
Over the past two decades, advances in next-generation sequencing (NGS) technology have significantly increased throughput to offer more genomic data at a lower cost. This increase, however, comes at a cost of shorter read lengths and, by extension, long-read applications. Traditional short-read technology reads molecules base-by-base and the resulting cumulative deterioration of signal over hundreds of cycles restricts results to a few hundred nucleotides. To overcome this limitation, molecular biologists devising early synthetic long-read (SLR) library prep methods used molecular tags and limiting dilution when preparing short-read libraries. The preserved tags were then used to informatically piece together longer sequences.
Several years later 10X Genomics first commercialized SLR, introducing a microfluidic device that increased throughput and streamlined the workflow. LoopSeq long-read library prep pushes the technology even further, obviating the need for microfluidics and expanding applications.
Unlocking a range of long-read applications
In contrast to SLR predecessors—whose per-molecule coverage was too low for true single-molecule long-read sequencing—LoopSeq targets complete coverage of long molecules. This novel technology assembles continuous long reads from single molecules, generating data identical in type to the data that legacy long-read technology companies such as PacBio and Oxford Nanopore generate.
The ability to assemble continuous long reads dramatically broadens the scope of NGS applications to embrace transcriptome, microbiome, and immune repertoire sequencing, to name a few. Moreover, LoopSeq chemistry eliminates the requirement to physically compartmentalize DNA molecules with limited dilution, enabling a more flexible workflow that does not require dedicated instrumentation. Figure 1 details the end-to-end workflow.
Lower error rates, higher accuracy
Arguably the biggest advantage LoopSeq offers compared to legacy SLR technology is higher accuracy and lower error rates. LoopSeq uses raw data (short reads) that typically have better error rates than the legacy long-read counterparts. LoopSeq then applies consensus-based error correction at each position of a long read based on many independent short reads that cover each position in a long read. The resulting long reads are extremely accurate, as demonstrated in Figure 2.¹
A key observation in Figure 1 and the study that produced it is the nonlinear relationship between the length of a long read and the probability that it is error-free. A long molecule is exponentially, not linearly, more likely to contain an error compared to a shorter molecule. As a result, the low error rate for 1.5 kb LoopSeq reads leads to roughly 50% more error-free reads compared to PacBio. For 5 kb reads, the rate of error-free reads increases to 500% more error-free reads. A roughly three-fold increase in length leads to a 10-fold increase in error-free reads, making LoopSeq accuracy increasingly important with read length and improving confidence in results.
More specifically, the study examined how to sequence complete 1.5 kb bacterial 16S DNA, 2.3 kb fungal 18S-ITS, and entire 5 kb bacterial ribosomal DNA (rDNA) clusters with a single long read to achieve extremely high accuracy. The study also addresses how accurate long reads enable species- and strain-level microbiome classifications.
Another study of LoopSeq error rates for the transcriptome space provides a comparative analysis of human messenger RNA (mRNA) spanning multiple sequencing technologies.² In this study, the authors find that LoopSeq error rates are significantly lower than other short- and long-read technologies across all error types, as shown in Table 1. Crucially, cancer progression involves isoform switching, in which specific clonotypes evolve to express isoform-specific single-point mutations. This tiny variation in the sequence of isoforms leads to vastly different phenotypes, a discovery made possible only by the low error rates inherent in Element LoopSeq long-read technology.
|Error Type||PacBio-CCS RNA||ONT-2D RNA||Illumina RNA||LoopSeq RNA||LoopSeq DNA|
Table 1: Comparing error rates across diverse sequencing platforms²
In sum, the ability to obtain highly accurate long reads has been an enduring challenge. The enhanced resolution of LoopSeq long reads enables interrogation of long DNA and RNA molecules using previously inaccessible methods to continue the growth and variety of applications across the field of genomics.
To learn more about Element LoopSeq and the benefits of our exclusive offering of both short- and long-read sequencing, visit Element LoopSeq Long-Read Workflows.
¹ Callahan, Benjamin J., Dmitry Grinevich, Siddhartha Thakur, et al., “Ultra-accurate microbial amplicon sequencing with synthetic long reads,” Microbiome 9, no. 130 (June 2021): https://doi.org/10.1186/s40168-021-01072-3.
² Liu, Silvia, Indira Wu, Yan-Ping Yu, et al., “Targeted transcriptome analysis using synthetic long read sequencing uncovers isoform reprograming in the progression of colon cancer,” Communications Biology 4, no. 506 (April 2021): https://doi.org/10.1038/s42003-021-02024-1.