AVITI™ Genomic Analysis on AWS – Simple, Powerful, and Cost-Effective Cloud Computing

By: Francisco Garcia, Rosi Bajari, Claudia Dennler, Max Mass, Bryan Lajoie

Sequencing with Element on the AVITI™ System

Genomic analysis is playing an increasingly pivotal role in both research and clinical operations. To achieve efficient and cost-effective sample-to-answer workflows, labs face the challenge of building a cross-functional team with expertise in DNA and RNA sequencing, informatics, cloud computing, and IT. This is particularly daunting for labs that are new to sequencing.

While the AVITI System provides significantly lower run costs than other benchtop sequencing systems, some customers remain daunted by the perceived complexity and cost of analyzing the generated data. In this blog post, we show how Element Biosciences has partnered with Amazon Omics to provide customers with an analysis solution that is fast, flexible, powerful, and very cost-effective using ready-to-run Bases2Fastq workflows.

From Sample to Answer: A Journey in Sequencing

Sequencing data analysis is separated into primary, secondary, and tertiary analysis. Secondary analysis includes three stages, demultiplexing, data transposition, and genomic analysis.

  • Primary analysis involves processing the signal generated by sequencing instruments to generate base calls and associated quality scores. The AVITI uses an on-board FPGA for primary analysis to accommodate the high data rates produced during sequencing. In fact, the data rate is so high that only the processed data reach the on-board CPU.
  • The first part of secondary analysis is demultiplexing. With mid-throughput systems like the AVITI, customers typically pool multiple samples together on one flow cell, tagging each with a unique molecular barcode. Demultiplexing identifies the data associated with each barcode.
  • Transposition, the second part of secondary analysis, involves taking the data that was generated per-cycle and saving it as short read fragments in FASTQ format.
  • The third stage of secondary analysis, genomic analysis, depends on the intended sequencing application. For DNA resequencing applications, the short reads are aligned to a reference genome. Differences between the given sample and the reference are identified as genomic variants.
  • Tertiary analysis involves the interpretation of those variants to determine biological significance.

The AVITI can be configured to stream base calls, quality scores, and run metrics as they are generated directly to a customer’s S3 bucket on AWS. Subsequent analysis can be performed immediately without the need to copy to an intermediate location or export data from a managed and costly subscription-based service. With our upcoming fall release, customers will be able to seamlessly configure the automatic launch of Element’s Bases2Fastq Ready2RunWorkflow via Amazon Omics in their own AWS account as soon as their sequencing run completes.

This configuration takes minutes, uses a secure AWS IAM role and External ID, and can apply to all sequencing runs in a customer's account or to a subset of sequencing runs streaming to a particular S3 bucket. Once the execution is completed, Elembio™ Cloud complements Amazon Omics with execution records and the ability to visualize QC statistics in an interactive report.

From run start to analysis completion, customers have full control of their own data in their account. Users can optionally allow Element access to run metadata so that QC analysis can be presented in Elembio Cloud.

Analysis with the Element AVITI and the Omics Ecosystem

Once FASTQs are generated, users can take advantage of the elasticity of Amazon Omics to trigger subsequent secondary analysis for each sample in parallel. A run setup in Elembio Cloud can be configured to launch independent per-sample analysis workflows automatically. For example, one flow cell run comprised of three ~30x human genomes can be configured to launch downstream secondary analysis tools on independent instances as soon as demultiplexing is complete. Because the analyses are fully parallelized, the time to compute all samples is the same as it would be for a single sample.

  • Data transfer occurs during sequencing.
  • Demultiplexing with Bases2Fastq takes ~1 hour.
  • Sentieon DNAscope analysis, included as a 6-month free trial with an AVITI purchase, takes ~35 minutes for an AVITI 30X genome in AWS.
  • Fully processed human genome data, from base calls to FASTQ to BAM to VCF (variant calls), is available within 2 hours of run completion.

Cost Effective Base Calling with Bases2Fastq on Amazon Omics

So, how much does this analysis really cost? While there has been alarming messaging in the press that analysis costs are greater than sequencing costs, in truth the cost of cloud analysis is only a very small fraction of a sequencing budget. Let's delve into the actual numbers.

For a custom AWS setup, running Bases2Fastq on a m5.12x (48cpu/192g) instance (using spot) with a single attached 800G ebs-auto-scaling gp3 volume, it takes less than an hour to generate FASTQs for a typical 300G AVITI run. However, this setup requires substantial configuration time and ongoing management by a bioinformatics professional, resulting in potentially significant labor costs.

In contrast, utilizing Amazon Omics and the Element-managed Ready2Run Bases2Fastq workflow offers an easier and more cost-effective solution.

  • FASTQ generation costs approximately $3 for a 300G 2x150 run.
  • Running DNAscope on a 30x AVITI genome costs approximately $1.60 per sample.
  • For three pooled 30x human whole-genome sequencing samples on a single flow cell, the total cost per sample is ~$2.60 per sample: $3 per flow cell/ 3 samples for demultiplexing and $1.60 per sample for alignment and variant calling.
  • The cost per flow cell is $7.80 per flow cell, amounting to less than 1% of the cost of the flow cell itself.

From the perspectives of turnaround time, ease of use, and flexibility, utilizing Omics for genomic data analysis emerges as a significantly superior option compared to local operations. Furthermore, the cost per sample is negligible, making it an exceptionally compelling choice for users of Element Biosciences' AVITI sequencing platform.

To get started with Element Bioscience’s Bases2Fastq Ready2Run workflows, visit the Amazon Omics console.

To learn how an AVITI can help you achieve your research goals, contact an Element scientist.