An NGS Replacement for Sanger Sequencing: A Protein Engineering Case Study - Video and Q&A Transcript from AGBT 2022

AGBT 2022 Spotlight Talk - Presented by Matt Kellinger, PhD

Q: How is the long-read Sanger sequencing achieved? If you have a barcode ligated there, once you fragment, do some pieces not have the barcode?

A: In traditional approaches, that is what happens. Some of the cleverness of this method comes through in the molecular biology and it is worth jumping back into this distribution reaction. If the gene of interest is 2KB or 5KB, what happens in that step is the barcode on that end is inserted uniformly all along that gene. You have 1000s of copies and they each have a unique address and then you have a known barcode next to some part of the gene that you can access. That is how you do the short read to long read jump. You actually sequencing maybe only 150 bases, but you are doing it at all these different places along the gene and you have the barcode for reassembly.

Q: Even if you distribute uniformly, some of your short reads won’t have the UMI?

A: Actually they will, the way the fragmentation is done. It turns out that the UMI winds up at the beginning of every read; then you can aggregate by UMI and then throw it in your assembler.

Q: Have you tried to use a short-read instrument to do long reads and to see performance compared to performance with PacBio or Nanopore?

A: This system is short-read technology. Today it gives you 2x150 for a total read length. You can play with insert size to get it longer but you will still end up with 2x150 base pair reads. So for this application, that would mean a lot of missed content in the middle, if everything is not individually barcoded, reassembly would not happen. So that is what this assay overcomes with some molecular biology - a way to get to a synthetic long-read limited by long-range PCR – 20Kb is typically the end of that size. In terms of comparison to PacBio or ONT, there are several publications that address this topic.

Q: One of the constant challenges with Illumina from the very beginning is the absolute requirement for high complexity. Is that a similar requirement for Element’s AVITI? Or now, can we now simply take homopolymer stretches, a whole bunch of clones with exactly the same sequence and put them on with one change within a thousand clones and directly sequence it? Or do we need something like this that is going to randomly fragment to give us that complexity?

A: We have low diversity methods which are implemented, what we require is a reasonable diversity within the first 5 cycles. We are working on eliminating that requirement in our own Elevate prep, but if you are using compatibility, we do require that diversity within the first 5 cycles, after that it is pretty robust.

Q: How low can diversity be outside the initial cycles?

A: So we should be good with arbitrarily low. We haven’t worked it out extensively, but we would be excited to try your use-case and then figure out if we need maybe a 5% PHiX spike-in or if we are good to go.

Q: What is level of sample multiplexing can be achieved?

A: It is a single barcode on the front but there are 3 layers of barcoding that can happen so the multiplex levels can get high. You can have a well-based barcode which is this one, but then you can have plate-based barcodes so you don’t need endless well barcodes and then you can have barcode for the NGS run itself. So in a 96 plex sequencing run, one index could be dedicated to an application like this and inside that one index, you have 2 more layers of indexing which can let you put 1000s of samples inside that one NGS run.

View Q&A Transcript from "Uncovering the Meta-Transcriptome with Long-Read Sequencing" Presentation