Results Interpretation Guide


Whole Plasmid, ZeroPrep, & RCA Sequencing Services


Lots of reasons!

  • Scientific rigor and peace of mind.
  • E. coli and other hosts will go to great lengths to avoid expressing your leaky toxic gene, including modifying your plasmid in unexpected ways that are invisible to targeted Sanger sequencing.
  • Plasmid inserts are getting longer and more complex. Instead of multiple Sanger runs or synthesizing a sequencing primer or doing primer walking, sequence the whole plasmid.
  • Long reads are ideal for resolving repetitive regions that stymie Sanger sequencing.
  • Are you sure your plasmid isn't a dimer? Are you sure there aren't multiple plasmids in your strain? Sanger sequencing won't tell you, and we see it all the time.
  • It's neither much more expensive nor slower.


We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

  • We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including linearization of the circular input DNA in a sequence-independent manner.
  • We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
  • We generate a high-accuracy circular consensus sequence from the raw reads.
  • For standard size plasmids, we will also return a set of feature annotations.

Plasmid and circular samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.


In the vast majority of cases, we deliver plasmid sequencing results within one business day of receipt of your samples.


This service is intended for a clonal population of molecules. You can send mixtures of molecular species, but since we can't predict the analysis outcome, it's at your own risk.

  • If your species are very similar (e.g. differ by only a few nucleotides), the pipeline will most likely create a single .gbk consensus file, with mixed peaks observed in the .ab1 file at SNP/indel locations.
  • If your species are sufficiently distinct (e.g. vastly different in size or sequence), the pipeline will generate a single consensus sequence for the molecular species that produces the largest amounts of total sequencing data. (Note that concatemer forms such as dimers, trimers, etc. are not considered different molecular species by the pipeline, so you will only receive the monomer consensus sequence by default).

Ultimately, which species ends up producing a consensus will vary depending on overall sample quality, coverage, and relative abundance/degradation of each species.

Sequencing is considered successful if the pipeline is able to generate any consensus, even if it is not your target. Re-sequencing mixtures won't change the relative proportions of the species (and thus which species generates a consensus), but you can submit multiple aliquots if you need higher overall coverage.

If you'd like to sequence a known mixture (e.g. barcode or variant libraries), please consider submitting instead to our Custom Sequencing Service.


As per Oxford Nanopore's specs for the chemistry and flowcells we currently use for plasmid sequencing, the consensus accuracy is typically >99.99%.


The most common error modes for Oxford Nanopore are deletions in homopolymer stretches (especially if longer than 8 bp), errors at the Dam methylation site GATC, and errors at the middle position of the Dcm methylation site CCTGG or CCAGG. These limitations are expected to improve with future updates to ONT sequencing chemistry and basecalling software.

sample .fastq file

This figure shows an example .ab1 file indicating overlapping peaks for A and G at position 371 due to conflicting basecalls at a GATC methylation site, resulting in a lower confidence basecall. However, in this example, the consensus did correctly call this base as G (no sequencing error, just lower confidence).

We do not guarantee any specific level of coverage, as the number of raw reads generated can vary substantially depending on sample quality.

Successful samples sent at the required concentration typically yield in the high dozens to hundreds (or thousands!) of raw sequencing reads.

Average coverage is reported in the SAMPLE_summary.tsv file. Coverage over ~20x indicates a very accurate consensus.


  • Consensus sequence (.fasta file): Provides the polished consensus sequence of the plasmid, generated from the raw reads.
  • Consensus sequence (.gbk file): Provides the polished consensus sequence of the plasmid, generated from the raw reads. Also includes a plasmid map and feature annotations from the excellent pLannotate tool from the Barrick Lab:
    McGuffie,M.J. and Barrick,J.E. (2021) pLannotate: engineered plasmid annotation. Nucleic Acids Research DOI: 10.1093/nar/gkab374
  • Plasmid map (.html file): An interactive version of the pLannotate plasmid map.
  • Read length histogram (.png file): Displays the read length distribution of the raw reads produced by your sample, thereby providing unique insight into the contents of your samples. See more details about how to interpret your histograms below.
  • Virtual gel (.png file): Displays the raw read lengths from all samples in the order in a virtual gel format, resembling what you’d see if you ran the DNA fragments on a gel.
  • Chromatogram (.ab1 file): Displays the relative abundance of each nucleotide (A, T, G, C) for all raw reads that align to the consensus at each position of the sequence. See more details about how to interpret your chromatograms below.
  • Coverage plot (.png file): Displays the relative sequencing coverage at each position of the consensus sequence. A region with a large gap or a sudden large increase suggests either an assembly issue or a mixture of multiple plasmid species.
  • Per-base data (.txt and .tsv files): Includes 3 sub-files for each sample:
    • SAMPLE.tsv: Indicates how well the raw reads agree with the consensus sequence at each position. The list includes the consensus basecalls at each position, along with number of total raw reads aligning at that position and the basecall distributions in the raw reads for that position (A, T, G, C, matches, mismatches, insertions, deletions, etc.).
    • SAMPLE_multimer_analysis.txt: Indicates the % distribution of the various concatemer forms of the consensus sequence (monomer, dimer, trimer, etc.).
    • SAMPLE_summary.tsv: Indicates the length, average coverage, relative composition (by moles and mass), total reads, total bases, and %. E. coli genomic DNA contamination for the consensus sequence.
  • Raw read sequences (.fastq.gz file): Provides the sequences of individual raw reads that align to the consensus. Please note that these reads are NOT delivered in the default download, but can be downloaded separately by clicking the "Download Raw FASTQ" button at the top of the "Order Information" page. Note that any raw reads that do not align to the consensus (e.g. host genomic DNA, lower abundance molecular species) are excluded.
    FASTQ is a sequence format similar to FASTA, with additional Phred quality score information that can displayed graphically by most modern sequence viewers. Since each basecalled position only has one quality score, certain sequence features, such as insertions or deletions, must be inferred from looking at adjacent bases.

sample .fastq file

The figure above shows examples of the file types you will receive for successful plasmid sequencing.

Our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the linear/PCR DNA sent to us, so we do not guarantee results. If we are not able to generate a consensus sequence from your sample, our failure policy applies.


The histogram displays the lengths of the raw reads produced by your sample, with read length (bp) on the x-axis and thousands of bases of data collected (kb) at that length on the y-axis. The histogram is therefore weighted by amount of sequencing data produced by different sizes of molecules; for example, two DNA fragments of different lengths that produce the same number of reads will produce different amounts of total data.

The x-axis is automatically scaled to the maximum read length produced by your sample. Before sequencing your plasmids, we linearize them so that we get mostly full-length sequence reads. As a result, the lengths of the raw sequencing reads reflect the lengths of the molecular species in your sample.

Additionally, the histogram color key indicates what fraction of the raw data maps to the consensus sequence:

The data from a raw read is colored as... If...
Dark blue (ASSEMBLY read) Raw read aligns to the consensus/assembly sequence
Orange (E. COLI read) Raw read aligns to the E. coli genome
Light blue (UNMAPPED read) Raw read does not align to any of these categories (Could be sequencing noise, a genome other than E. coli, a lower abundance plasmid species that does not generate a consensus, etc.)

Ideally, your target plasmid will be the only species in the sample, and we will see one dominant peak in the read length histogram:

histo-one-species

A dominant peak (~4,800 bp in this case) typically suggests a clean prep with a single plasmid.

(Please note that even a single apparent peak MAY contain multiple plasmids of the same size, or multiple plasmids of different lengths that happen fall into the same histogram bin. Sequences that are very similar are assumed by the analysis pipeline to be variations of a single species and it will attempt to make a single consensus (with potentially low confidence positions reported); if the sequences are very distinct, it will only produce a consensus for the most abundant species.)

If your raw reads contain varying numbers of indels (common for noisy raw reads), this may sometimes cause the read lengths to straddle a bin boundary and artifactually create an appearance of two separate peaks:

histo-double-peak

These reads most likely all come from a single plasmid (~2,500 bp in this case), but varying numbers of insertion and deletion sequencing errors result in different lengths that cause them to straddle a bin boundary.

(Please note that a peak straddling a bin boundary MAY contain multiple plasmids of the same size, or multiple plasmids of different lengths that happen fall into two adjacent histogram bins. Sequences that are very similar are assumed by the analysis pipeline to be variations of a single species and it will attempt to make a single consensus (with potentially low confidence positions reported); if the sequences are very distinct, it will only produce a consensus for the most abundant species.)

More often than you would expect, though, we see multiple peaks corresponding to multiple plasmids, or a peak of a different size than the customer expected:

histo-multi-species-1

Uh oh! Good thing you did whole plasmid sequencing, Sanger sequencing might not have shown you all these plasmid species!

histo-multi-species-2

This sample contains 3 unique plasmid species, only 2 of which (~8,600 bp and ~4,200bp bp -- corresponding to the target plasmid and the empty vector, respectively) yielded enough coverage to produce a consensus.

If you sample contains a mixture, we will return only a single consensus for the molecular species that produces the largest amount of total sequencing data. If you’d like us try generating a consensus for an alternate peak instead, you can email us at support@plasmidsaurus.com to inquire.

Occasionally we see a sample with a dominant peak in addition to an abundance of degraded DNA (genomic and/or plasmid). In some cases the dominant peak may still produce a consensus, if read coverage and accuracy are sufficient:

histo-one-with-degradation

This sample produced a consensus for the ~14,000 bp peak, with the degraded plasmid fragments contributing to its coverage.

Sometimes we see a decent number of reads for the sample but there is NO dominant peak, indicating an abundance of degraded DNA (genomic and/or plasmid) from a poor plasmid prep, or that the strain contains no plasmids:

histo-no-peak

No dominant peak was observed in this sample, despite high read count. No consensus was generated.

Often, the read count is too low to distinguish any peaks or to generate any consensus:

histo-low-read-count

If read count is too low, usually it is because samples are not prepared at the required concentration.

We see concatemers like this all the time -- they are not a sequencing artifact. Sanger sequencing can't detect them and you won't see them on gel of your digested/linearized plasmid, so you're not used to seeing them, but they turn out to be very common. If you run your sample uncut on a gel with a supercoiled ladder, you will see the concatemer band.

They often seem to be formed in vivo during growth in a RecA+ strain (such as NEB Turbo cells), and are more common when plasmids have large repetitive regions or other complex structures. Even plasmid manufacturers like Addgene observe that concatemers occur frequently, and that only the long-read sequencing technologies like the one we use here Plasmidsaurus (that is, Oxford Nanopore Technologies) can detect them!

Please note that concatemer forms such as dimers, trimers, etc. are not considered different molecular species by the pipeline, so you will only receive the monomer consensus sequence by default, even if other concatemer forms produced more sequencing data.

histogram

Fig1. This histogram shows that most of the data is produced by the monomer, but these is also a small amount of data from the dimer.
virtualgel

Fig2. Each sample shown in this virtual gel displays two distinct bands, one for the monomer and one for the dimer.

The .ab1 format has widespread use in Sanger sequencing and normally indicates the intensity of fluorescent nucleotides (A, T, G, C) at each position of the consensus. Since fluorescence is not employed in the Oxford Nanopore Sequencing technology that we use here at Plasmidsaurus, we generate this .ab1 file synthetically using the relative abundance of each nucleotide (A, T, G, C) from the raw reads at each position of the consensus sequence. Because this file type was originally used for Sanger sequencing (which is limited to much shorter read lengths than we get with Oxford Nanopore), the file has a maximum size limit and therefore we must often report sequences in multiple pieces.

This file gives a visual representation of polymorphisms and molecular mixtures present in the sample, and putative insertions can be observed more clearly. An ideal high accuracy basecall will have a sharp, distinct peak of a single color indicating a single nucleotide, whereas a low accuracy basecall will have a less defined or mixed (overlapping) peaks.

sample .fastq file

The figure above shows examples of quality reporting in .fastq and .ab1 format.

For plasmids, "failure" means that your sample did not produce data of sufficient quality and quantity for the pipeline to generate a consensus sequence.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your plasmid samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

  • Samples are not prepared at the required DNA concentration.
    The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent.
    You may see evidence of this failure mode in the low amount of total data reported in the raw read length histogram and in the low consensus coverage reported in the SAMPLE_summary.tsv file.
  • Samples contain a mixture of plasmid species and/or fragmented genomic DNA or fragmented plasmids.
    You may see evidence of this failure mode in a wide range of read lengths reported in the raw read length histogram.

To achieve optimal sequencing results, please follow our recommended plasmid sample prep instructions, ZeroPrep cell prep instructions, or RCA sample prep instructions


It is relatively rare that we cannot return a consensus sequence, but some rate of failure is unavoidable. You are welcome to submit a rerun request for any failed plasmid samples through your Order Info page (please note that ZeroPrep and RCA Sequencing Services are NOT eligible for reruns). We will evaluate whether your plasmid sample quality and quantity permits rerunning your sample (we may also ask you to provide a reference sequence). We do still charge for failed samples.

Linear/PCR Sequencing Service


We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

  • We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including minimal fragmentation of the input linear DNA in a sequence-independent manner
  • We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
  • We use the re-assembled raw reads to generate a high-accuracy linear consensus sequence from the raw reads.
  • For standard linear/PCR samples, we will also return a set of feature annotations.

Linear/PCR samples are sequenced WITHOUT primers or amplification.Please do not ship any primers with your samples or mix primers into your samples.


In the vast majority of cases, we deliver linear/PCR sequencing results within one business day of receipt of your samples.


This service is intended for a clonal population of molecules. You can send mixtures of molecular species, but since we can't predict the analysis outcome, it's at your own risk.

  • If your species are very similar (e.g. differ by only a few nucleotides), the pipeline will most likely create a single .gbk consensus file, with mixed peaks observed in the .ab1 file at SNP/indel locations.
  • If your species are sufficiently distinct (e.g. vastly different in size or sequence), the pipeline will generate a single consensus sequence for the molecular species that produces the largest amounts of total sequencing data. (Note that concatemer forms such as dimers, trimers, etc. are not considered different molecular species by the pipeline, so you will only receive the monomer consensus sequence by default).

Ultimately, which species ends up producing a consensus will vary depending on overall sample quality, coverage, and relative abundance/degradation of each species.

Sequencing is considered successful if the pipeline is able to generate any consensus, even if it is not your target. Re-sequencing mixtures won't change the relative proportions of the species (and thus which species generates a consensus), but you can submit multiple aliquots if you need higher overall coverage.

If you'd like to sequence a known mixture (e.g. barcode or variant libraries), please consider submitting instead to our Custom Sequencing Service or Premium PCR Service.


As per Oxford Nanopore’s specs for the chemistry and flowcells we currently use for linear/PCR sequencing, the consensus accuracy is typically >99.99%.


Depending on the sequence of your sample, the assembler does sometimes have difficulty reconstructing the terminal ends of linear DNA, which may result in up to ~25 nucleotides missing from the 3’ and/or 5’ ends of your insert.

The most common error modes for Oxford Nanopore are deletions in homopolymer stretches (especially if longer than 8 bp), errors at the Dam methylation site GATC, and errors at the middle position of the Dcm methylation site CCTGG or CCAGG. These limitations are expected to improve with future updates to ONT sequencing chemistry and basecalling software.

sample .fastq file

This figure shows an example .ab1 file indicating overlapping peaks for A and G at position 371 due to conflicting basecalls at a GATC methylation site, resulting in a lower confidence basecall. However, in this example, the consensus did correctly call this base as G (no sequencing error, just lower confidence).

We do not guarantee any specific level of coverage, as the number of raw reads generated can vary substantially depending on sample quality.

Successful samples sent at the required concentration typically yield in the high dozens to hundreds (or thousands!) of raw sequencing reads.

Coverage over ~20x indicates a very accurate consensus.


  • Consensus sequence (.fasta file): Provides the polished consensus sequence of the linear/PCR molecule, generated from the raw reads.
  • Consensus sequence (.gbk file): Provides the polished consensus sequence of the linear/PCR molecule, generated from the raw reads. Also includes a molecular map and feature annotations from the excellent pLannotate tool from the Barrick Lab:
    McGuffie,M.J. and Barrick,J.E. (2021) pLannotate: engineered plasmid annotation. Nucleic Acids Research DOI: 10.1093/nar/gkab374
  • Molecular map (.html file): An interactive version of the molecular map. Note that this map will be depicted as circular, but the bold black bar at position 1 indicates that it is indeed linear.
  • Read length histogram (.png file): Displays the read length distribution of the raw reads produced by your sample, thereby providing unique insight into the contents of your samples. Note that you will typically see a smear of read lengths due to minimal fragmentation during the library prep process..
  • Virtual gel (.png file): Displays the raw read lengths from all samples in the order in a virtual gel format, resembling what you’d see if you ran the DNA fragments on a gel. Note that you will typically see a smear of read lengths due to minimal fragmentation during the library prep process.
  • Chromatogram (.ab1 file): Displays the relative abundance of each nucleotide (A, T, G, C) for all raw reads that align to the consensus at each position of the sequence. See more details about how to interpret your chromatograms below.
  • Coverage plot (.png file): Displays the relative sequencing coverage at each position of the consensus sequence. A region with a large gap or a sudden large increase suggests either an assembly issue or a mixture of multiple plasmid species.
  • Per-base data (.tsv file): Indicates how well the raw reads agree with the consensus sequence at each position. The list includes the consensus basecalls at each position, along with number of total raw reads aligning at that position and the basecall distributions in the raw reads for that position (A, T, G, C, matches, mismatches, insertions, deletions, etc.).
  • Summary file (.txt file): Indicates the % distribution of the various concatemer forms of the consensus sequence.
  • Raw read sequences (.fastq.gz file): Provides the sequences of individual raw reads that align to the consensus. Please note that these reads are NOT delivered in the default download, but can be downloaded separately by clicking the "Download Raw FASTQ" button at the top of the "Order Information" page. Note that any raw reads that do not align to the consensus (e.g. host genomic DNA, lower abundance molecular species) are excluded.
    FASTQ is a sequence format similar to FASTA, with additional Phred quality score information that can displayed graphically by most modern sequence viewers. Since each basecalled position only has one quality score, certain sequence features, such as insertions or deletions, must be inferred from looking at adjacent bases.
  • Our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the linear/PCR DNA sent to us, so we do not guarantee results. If we are not able to generate a consensus sequence from your sample, our failure policy applies.


The .ab1 format has widespread use in Sanger sequencing and normally indicates the intensity of fluorescent nucleotides (A, T, G, C) at each position of the consensus. Since fluorescence is not employed in the Oxford Nanopore Sequencing technology that we use here at Plasmidsaurus, we generate this .ab1 file synthetically using the relative abundance of each nucleotide (A, T, G, C) from the raw reads at each position of the consensus sequence. Because this file type was originally used for Sanger sequencing (which is limited to much shorter read lengths than we get with Oxford Nanopore), the file has a maximum size limit and therefore we must often report sequences in multiple pieces.

This file gives a visual representation of polymorphisms and molecular mixtures present in the sample, and putative insertions can be observed more clearly. An ideal high accuracy basecall will have a sharp, distinct peak of a single color indicating a single nucleotide, whereas a low accuracy basecall will have a less defined or mixed (overlapping) peaks.

sample .fastq file

The figure above shows examples of quality reporting in .fastq and .ab1 format.

For linear/PCR samples, "failure" means that your sample did not produce data of sufficient quality and quantity for the pipeline to generate a consensus sequence.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your linear/PCR samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

  • You submitted to the "unpurified" service.
    Unpurified DNA (with DNA still intermixed with the original reaction reagents) is more likely to fail. We recommend performing cleanup and submitting the "purified" service next time.
  • Samples are not prepared at required DNA concentration.
    The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent.
    You may see evidence of this failure mode in the low amount of total data reported in the raw read length histogram.
  • Samples contain a mixture of linear/PCR species, fragmented linear/PCR products, and/or fragmented genomic DNA.
    Because this service includes minimal fragmentation, even successful samples will display a smear of read lengths on the read length histograms. Therefore this failure mode can be difficult to diagnose from the histogram alone, so you may need to rely on other metrics.

To achieve optimal sequencing results, please follow our recommended linear/PCR sample prep instructions.


It is relatively rare that we cannot return a consensus sequence, but some rate of failure is unavoidable. You are welcome to submit a rerun request for any failed samples through your Order Info page. We will evaluate whether your sample quality and quantity permits rerunning your sample (we may also ask you to provide a reference sequence). We do still charge for failed samples.

Premium PCR Sequencing Service


We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

  • We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including end-ligation of the linear input DNA. Your DNA is not fragmented.
  • We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
  • We generate a high-accuracy linear consensus sequence from the raw reads for the most abundant molecular species.

Premium PCR samples are sequenced WITHOUT primers or amplification.Please do not ship any primers with your samples or mix primers into your samples.


In the vast majority of cases, we deliver Premium PCR sequencing results within one week of receipt of your samples.

Premium PCR takes longer to sequence (one week) and is more expensive ($30 per sample) to run than the regular Linear/PCR service, but is ideal in the following scenarios:

  • If your linear DNA sample is NOT clonal, but rather contains a mixture of different molecules (e.g. barcode or variant libraries) that you would like to fully characterize.
  • If you require full-length, end-to-end reads that are sequenced without any fragmentation.
  • If you require a arger number of sequencing reads beyond the typical yield from the regular Linear/PCR Service. Premium PCR yields up to 5,000 raw reads per sample, for DNA up to 25 kb in length.
  • If you require more accurate raw reads than the regular Linear/PCR Service, with higher per-base confidence.
  • If you require obtaining ALL raw reads produced by your sample, rather than just the raw reads that align to your consensus as with the regular Linear/PCR Service. (Please note that returning ALL raw reads means there is a low level chance of demultiplexing error, so a few reads from your sample might be returned to another customer on the same sequencing run.)

Yes! Just note that by default, we will generate only one high-accuracy linear consensus sequence for the single most abundant molecular species in your sample. If your sample contains multiple molecular species, you may perform your own analyses on the raw reads that are included with your results, or email support@plasmidsaurus.com to ask for additional consensus sequences to be generated.

If you need more than 5,000 reads to fully characterize your molecular mixture, you can submit multiple aliquots of each sample to the Premium PCR service, or you can submit instead to our Custom Sequencing Service where we can obtain as much data as you specifically require.


As per Oxford Nanopore's specs for the chemistry and flowcells we currently use for Premium PCR sequencing, the consensus accuracy is typically >99.99%. The raw reads from this service are also more accurate than the raw reads from the regular Linear/PCR Service, with higher per-base confidence.


The most common error modes for Oxford Nanopore are deletions in homopolymer stretches (especially if longer than 8 bp), errors at the Dam methylation site GATC, and errors at the middle position of the Dcm methylation site CCTGG or CCAGG. These limitations are expected to improve with future updates to ONT sequencing chemistry and basecalling software.

sample .fastq file

This figure shows an example .ab1 file indicating overlapping peaks for A and G at position 371 due to conflicting basecalls at a GATC methylation site, resulting in a lower confidence basecall. However, in this example, the consensus did correctly call this base as G (no sequencing error, just lower confidence).

For samples sent at the correct concentration, we typically collect 3,000-5,000 raw sequencing reads.


  • Consensus sequence (.fasta file): Provides the polished consensus sequence of the most abundant molecular species in your sample.
  • Consensus sequence (.gbk file): Provides the polished consensus sequence of the most abundant molecular species in your sample, generated from the raw reads. We also return a molecular map and feature annotations from the excellent pLannotate tool from the Barrick Lab:
    McGuffie,M.J. and Barrick,J.E. (2021) pLannotate: engineered plasmid annotation. Nucleic Acids Research DOI: 10.1093/nar/gkab374
  • Molecular map (.html file): An interactive version of the molecular map.
  • Read length histogram (.png file): Displays the read length distribution of the raw reads produced by your sample, thereby providing unique insight into the contents of your samples. .
  • Virtual gel (.png file): Displays the raw read lengths from all samples in the order in a virtual gel format, resembling what you’d see if you ran the DNA fragments on a gel.
  • Chromatogram (.ab1 file): Displays the relative abundance of each nucleotide (A, T, G, C) for all raw reads that align to the consensus at each position of the sequence. See more details about how to interpret your chromatograms below.
  • Coverage plot (.png file): Displays the relative sequencing coverage at each position of the consensus sequence. A region with a large gap or a sudden large increase suggests either an assembly issue or a mixture of multiple plasmid species.
  • Per-base data (.txt and .tsv files): Includes 3 sub-files for each sample:
    • SAMPLE.tsv: Indicates how well the raw reads agree with the consensus sequence at each position. The list includes the consensus basecalls at each position, along with number of total raw reads aligning at that position and the basecall distributions in the raw reads for that position (A, T, G, C, matches, mismatches, insertions, deletions, etc.).
    • SAMPLE_multimer_analysis.txt: Indicates the % distribution of the various concatemer forms of the consensus sequence (monomer, dimer, trimer, etc.).
    • SAMPLE_summary.tsv: Indicates the length, average coverage, relative composition (by moles and mass), total reads, total bases, and %. E. coli genomic DNA contamination for the consensus sequence.
  • Raw read sequences (.fastq.gz file): Provides the sequences of ALL individual raw reads produced by your sample. Please note that returning all raw reads means there is a low level chance of demultiplexing error, so a few reads from your sample might be returned to another customer on the same sequencing run.
    FASTQ is a sequence format similar to FASTA, with additional Phred quality score information that can displayed graphically by most modern sequence viewers. Since each basecalled position only has one quality score, certain sequence features, such as insertions or deletions, must be inferred from looking at adjacent bases.
Our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the linear/PCR DNA sent to us, so we do not guarantee results. If we are not able to generate a consensus sequence from your sample, our failure policy applies.


The .ab1 format has widespread use in Sanger sequencing and normally indicates the intensity of fluorescent nucleotides (A, T, G, C) at each position of the consensus. Since fluorescence is not employed in the Oxford Nanopore Sequencing technology that we use here at Plasmidsaurus, we generate this .ab1 file synthetically using the relative abundance of each nucleotide (A, T, G, C) from the raw reads at each position of the consensus sequence. Because this file type was originally used for Sanger sequencing (which is limited to much shorter read lengths than we get with Oxford Nanopore), the file has a maximum size limit and therefore we must often report sequences in multiple pieces.

This file gives a visual representation of polymorphisms and molecular mixtures present in the sample, and putative insertions can be observed more clearly. An ideal high accuracy basecall will have a sharp, distinct peak of a single color indicating a single nucleotide, whereas a low accuracy basecall will have a less defined or mixed (overlapping) peaks.

sample .fastq file

The figure above shows examples of quality reporting in .fastq and .ab1 format.

For Premium PCR samples, "failure" means that your sample did not produce data of sufficient quality and quantity for the pipeline to generate any consensus sequence from your mixture.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your Premium PCR samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

  • You submitted unpurified DNA.
    Unpurified samples (with DNA still intermixed with the original reaction reagents) are more likely to fail and/or you may be charged a purification fee.
  • Samples are not prepared at the required DNA concentration. The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent. You may see evidence of this failure mode in the low amount of total data reported in the read length histogram or in the small number of raw .fastq reads received.
  • Samples contain a mixture of DNA species, but none of them obtained sufficient coverage on its own to produce a consensus.

To achieve optimal sequencing results, please follow our recommended Premium PCR sample prep instructions.


It is relatively rare that we cannot return a consensus sequence, but some rate of failure is unavoidable. We do still charge for failed samples.
  • If your purified DNA sample failed to produce a consensus, but you received MORE THAN 2,000 full-length raw reads, this suggests that your sample contains a diverse mixture of products and/or the DNA is fragmented. You can email us at support@plasmidsaurus.com to request assembly of a different molecule in the mixture (we may also ask you to provide a reference sequence).
  • If your purified DNA sample failed to produce a consensus and you received LESS THAN 2,000 full-length raw reads, you are welcome to submit a rerun request through your Order Info page. We will evaluate whether your sample quality and quantity permits rerunning your sample (we may also ask you to provide a reference sequence).

Bacterial Genome Sequencing Service


We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

  • We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including minimal fragmentation of the input genomic DNA in a sequence-independent manner.
  • We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
  • We produce a high-quality genome assembly (see How is the bacterial genome assembly generated?)..
  • We produce a set of bacterial genome annotations with Bakta (delivered in various file formats).

Bacterial DNA samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.


When you send pre-extracted gDNA, we deliver bacterial genome sequencing results within 1-2 business days of receipt of your samples. When our genomic DNA extraction option is selected, we deliver bacterial genome sequencing results within 3-5 business days of receipt of your samples. When the Hybrid sequencing option is selected, we deliver hybrid results with 6-8 business days (or 8-10 business days if you also include the extraction option).


We require a minimum raw read Qscore of 10 (90% accuracy) during sequencing, although most raw reads are above Q20 (99% accuracy). We are also able to use a higher-accuracy basecalling model on these raw reads than with our Whole Plasmid and Linear/PCR services.

During assembly, we filter the reads for quality as described below. If sufficient coverage to meet our target is obtained, we typically see assembled contigs with ~Q40 (99.99%) accuracy.

We can obtain even higher accuracy in the known error-prone homopolymers and methylated motifs with our Hybrid sequencing option that polishes with Illumina data..


The most common error modes for Oxford Nanopore are deletions in homopolymer stretches (especially if longer than 8 bp), errors at the Dam methylation site GATC, and errors at the middle position of the Dcm methylation site CCTGG or CCAGG. These limitations are expected to improve with future updates to ONT sequencing chemistry and basecalling software. If you know that you need single-nucleotide accuracy in your assembly for these regions, please consider submitting to the Hybrid sequencing option to polish out those errors with Illumina data.

sample .fastq file

This figure shows an example .ab1 file indicating overlapping peaks for A and G at position 371 due to conflicting basecalls at a GATC methylation site, resulting in a lower confidence basecall. However, in this example, the consensus did correctly call this base as G (no sequencing error, just lower confidence).

Successful sequencing is defined by achieving at least one of the following deliverables:

  • A high-quality genome assembly
  • Target amount of raw data
    • 210 Mb of raw sequencing data for the "standard" service (i.e. 30x genome coverage of a single 7 Mb genome)
    • 360 Mb of raw sequencing data for the "big" service (i.e. 30x genome coverage of a single 12 Mb genome)

    If you select the Hybrid sequencing option we target the same amount of ONT data listed above, PLUS an equal amount of Illumina short-read data:

  • Target amount of raw Illumina data
    • 210 Mb of raw Illumina data for the "standard" service (i.e. 30x genome coverage of a single 7 Mb genome)
    • 360 Mb of raw Illumina data for the "big" service (i.e. 30x genome coverage of a single 12 Mb genome)

    However, our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the gDNA that is sent to us. We do not guarantee any specific output.


    1. Remove the bottom 5% worst fastq reads via Filtlong v0.2.1 (default parameters)
    2. Downsample the reads to 250 Mb via Filtlong to create a rough sketch of the assembly with Miniasm v0.3
    3. Using information acquired from the Miniasm assembly, re-downsample the reads to ~100x coverage (do nothing if there isn't at least 100x coverage) with heavy weight applied to removing low quality reads (helps small plasmids stick around)
    4. Run a Flye v2.9.1 assembly with parameters selected for high quality ONT reads
    5. Polish Flye assembly via Medaka v1.8.0 using the reads generated in step 3
    6. Run several analyses:

    7. If you select the NEW Hybrid sequencing option, we target the same deliverables listed above for your ONT-only assembly, PLUS we run the following additional step:
    8. Polish the ONT .fna assembly with Illumina .fastq reads using Polypolish v0.6.0, which yields a new .fasta polished hybrid assembly file

    • FASTA files:
      • .fna (contig nucleotide sequences) = polished consensus sequence of the genome
      • .faa (protein amino acid sequences)
      • .ffn (gene nucleotide sequences)
    • GenBank files:
      • .gbff (annotated contig sequences) = polished and annotated consensus sequence of the genome
    • FASTQ file:
      • .fastq.gz = a compressed file of all the raw ONT sequencing reads
    • Report:
      • .html = An analytical report of the key metrics for the assembly (including completeness of the assembly based on CheckM v1.2.2and general species identification of the contigs)
    • Various other Bakta annotation files

    If you select the Hybrid sequencing option, we deliver the same files listed above for your ONT-only assembly, PLUS we deliver the following additional files:

    • FASTQ file:
      • .fastq.gz = a compressed file of all the raw Illumina sequencing reads
    • FASTA file:
      • .fasta polished hybrid assembly file, made by polishing the .fna ONT assembly with Illumina .fastq sequence reads

    For the Hybrid service, we do not repeat genome assembly or contig identification after Illumina polishing, as Illumina reads are only used for resolving SNPs and other errors (they are too short to affect contiguity). The contiguity of the assembly is entirely determined by the longer ONT reads, and therefore the quality of your gDNA.


    If we are not able to achieve at least one of the target deliverables, then we will repeat sequencing as per our bacterial failure repeat policy.

    Even when a high-quality assembly cannot be generated, we still provide the raw data and the report, and you may also still receive some of the other file types.

    Although we do not provide definitive reasons on why each specific sample failed (or had low coverage), by far the most common reasons are:

    • Your samples are not shipped at the required DNA concentration of 50 ng/uL.
      The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent fluorometric assay.
    • The gDNA in your samples is degraded or fragmented.
      At least 50% of the DNA should be above 15kb in length, and samples should be handled with utmost care:
      • Pipetting with wide-bore tips
      • Minimal freeze/thaw cycles
      • No vortexing
      • No extreme temperature/pH
      • No intercalating dyes
      • No UV radiation
      • Not over-dried
    • Your samples contain inhibitors, such as:
      • RNA
      • Denaturants (guanidinium salts, phenol, etc.)
      • Detergents (SDS, Triton-X100, etc.)
      • Residual contaminants from the organism/tissue (heme, humic acid, polyphenols, polysaccharides, lipids, etc.)
      • Insoluble, colored, or cloudy material
      • Other inhibitors (EDTA, etc.)
    • The DNA you sent is not from a single bacterial isolate.
      This service is intended for a clonal population (single species) of bacteria. If your sample contains a mixture of different bacterial species, it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information.
    • For the extraction option: You did not ship us the required number of bacterial cells, or the cells you shipped did not come from a single bacterial isolate.
      This service is intended for a clonal population (single species) of bacteria. If your sample contains a mixture of different bacterial species, it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information. Additionally, please perform a cell count while preparing your preserved cells to confirm that you are sending the number of cells we require.

    To increase chances of successful sequencing on the first attempt, please adhere closely to our sample prep guidelines and cell pellet guidelines.


    If we are not able to achieve at least one of the target deliverables on the 1st sequencing attempt, we will evaluate the results of the initial sequencing attempt to determine whether additional sequencing may produce a more successful outcome, and if so we will repeat the sequencing (with possible protocol adjustments) at no additional charge. We will also combine the data from the two runs together to increase chances of success on the repeat attempt.

    If we are not able to achieve at least one of the target deliverables after the 2nd attempt, we will not perform further repeats. We do still charge for failed samples, since we spend more time and resources on them than we do on successes.

    If you wish to sequence the sample again, please prepare new samples that meet all the QC requirements before submitting a new sequencing request.


    This service is intended for a clonal population (single species) of bacteria. You can send mixtures of different bacterial species for sequencing, but since we can't predict the assembly outcome, it's at your own risk.

    The total amount of raw data obtained for your sample will be divided up between however many species are present in your sample, thereby reducing each species’ own genome coverage and possibly inhibiting assembly of particular species in the sample. Re-sequencing mixtures won't change the relative proportions of the species, but you can submit multiple aliquots if you need higher total coverage. Ultimately, which species end up producing an assembly will vary depending on overall sample quality, coverage, and relative abundance/degradation of each species.

    If you require even larger amounts of data for metagenomic applications, please consider submitting instead to your Custom Sequencing Service.


    We sequence all molecules in the received sample without primers, so if your extracted bacterial DNA also contains plasmid DNA, then yes you will probably receive some plasmid reads. Most of the sequenced DNA fragments < 3kb are omitted during data processing, but otherwise we do not select against or omit plasmid-sized reads during sequencing or assembly.

    The number of raw reads produced by each type of DNA will vary based on their relative abundance and quality. As for assembly outcomes, we do usually see that plasmid contigs are produced along with the gDNA chromosome contig(s) during assembly. However, since this bacterial genome sequencing service is optimized for assembly of the chromosomal genome (not for plasmids), we cannot guarantee that the raw plasmid reads will always yield an assembled plasmid contig. If you do need assemblies for the plasmids, you may need to isolate reads that align to your expected plasmids and assemble them yourself with a different pipeline.

    Ultimately, when submitting mixtures, which types of DNA in your sample end up producing an assembled contig will vary depending on overall sample quality, coverage, and relative abundance/degradation of each type.


    Yes, we can provide yeast sequencing & assembly through this service! You can submit your purified yeast gDNA (not preserved or live cells, as we are not currently offering yeast extractions) under the "big bacteria" service, then email us at support@plasmidsaurus.com to let us know your 6-character order ID and expected yeast species. We will manually generate yeast annotations and send them to you via email, and you would want to ignore the default bacterial annotations provided by the pipeline.


    Yes, any species can technically be sequenced and assembled with this method, but submitting samples for non-microbial applications is at your own risk since we have not optimized the amount of data required for each specimen type, and our assembly/annotation pipeline is targeted for microbes. Further, you might need to submit multiple aliquots of each sample in order to get enough genome coverage, and you would need to combine the data from all your aliquots prior to running your own assembly pipeline.

    When larger amounts of data are needed (more than 1 Gb, and up to several Tb), we can sequence your eukaryotic genomes instead through our Custom Sequencing Service! With our Custom service, we can also:

    • Obtain as much data as you specifically require
    • Optionally add on Illumina data if you need it for known error-prone motifs
    • Optionally perform custom genome assembly & annotation for your particular species

    AAV Genome Sequencing Service


    This service is performed using the newest long-read sequencing technology from Oxford Nanopore Technologies (ONT), and includes the following components:

    • We extract whole AAV genomes from your intact viral capsids (also includes a DNase pre-treatment to deplete any non-encapsulated DNA).
    • We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including end-ligation for your linear ssAAV or scAAV DNA.
    • We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (all the raw data produced by your sample is delivered in .fastq format).
    • We identify and assemble subspecies from the raw sequencing reads to generate high-accuracy linear consensus sequences for all detectable AAV genome subspecies (full-length, truncations, etc.) that comprise at least 1-5% of the total subspecies, depending on the sample. We also deliver metrics on the relative quantification of each viral subspecies and histograms of genome size vs. read count.

    AAV samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

    In the vast majority of cases, we deliver AAV sequencing results within 3 business days of receipt of your samples.

    If your AAV genomes are contained within purified, intact viral capsids (cell-free encapsulated AAV genomes in either ssAAV or scAAV genome configuration), please submit them to this AAV service!

    If your AAV genomes are cloned into dsDNA circular plasmids, those can be sequenced through our Whole Plasmid sequencing service instead.

    Please contact support@plasmidsaurus.com if you are interested in sequencing pre-extracted AAV DNA rather than the intact capsids that are required for this AAV service.

    Yes! We return high-accuracy linear consensus sequences (.fasta) for all detectable AAV genome subspecies (isoforms) that comprise at least 1-5% of the total subspecies. We also provide the .fastq sequences of all raw reads produced by your sample.

    As per Oxford Nanopore's specs for the chemistry and flowcells we currently use for AAV sequencing, the consensus accuracy is typically >99.99%. The raw reads from this service are also more accurate than the raw reads from the regular Whole Plasmid sequencing service, with higher per-base confidence.

    The most common error modes for Oxford Nanopore are deletions in homopolymer stretches (especially if longer than 8 bp), errors at the Dam methylation site GATC, and errors at the middle position of the Dcm methylation site CCTGG or CCAGG. These limitations are expected to improve with future updates to ONT sequencing chemistry and basecalling software.

    For intact viral capsids sent at the required concentration, we typically collect 500-3000 or more full-length isoform reads.

    • Consensus sequences (.fasta files): Provides high-accuracy linear consensus sequences for all detectable AAV genome subspecies (full-length, truncations, etc.) that comprise at least 1-5% of the total subspecies, depending on the sample.
    • Read length histogram (.png file): Displays the read length distribution of all the raw reads produced by your sample, with read length (genome size) vs. number of reads (molecular counts).
    • Isoform quantifications (.tsv): Indicates relative quantification of each AAV genome subspecies (isoform) as a fraction of total reads obtained.
    • Raw read sequences (.fastq.gz file): Provides the sequences of all raw reads produced by your sample. Please note that returning all raw reads means there is a small chance of demultiplexing error, so a few reads from your sample might be returned to another customer on the same sequencing run.

    Our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the viral capsids sent to us, so we do not guarantee results.

    For AAV samples, "failure" means that your sample did not produce at least one consensus/assembly.

    Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your AAV samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

    • Viral capsids are not prepared at the required concentration, or all of the viral capsids we received are empty (contain no AAV genomes).
      A commonly used method for AAV capsid quantification (titration) and verification of genome content is digital droplet PCR (ddPCR) (see protocol from Addgene). Please refer to published literature for additional capsid titration protocols.
    • The viral capsids we received are degraded (not intact).
      We perform a DNase pre-treatment on your viral capsids to deplete any non-encapsulated DNA before sequencing. If your capsids are degraded (not intact) upon receipt, the viral DNA that has leaked from the capsids will be digested away by our DNase treatment.
    • The viral capsids we received were not sufficiently purified from cell culture.
      Please verify that your purified AAV samples contain no remaining host cells or cell lysate. A commonly used method for AAV capsid purification is ultracentrifugation with a cesium chloride (CsCl) density gradient or iodixanol gradient (IOD) (see protocols in Lamla et al, 2015). Please refer to published literature for additional capsid purification protocols.
    • The wrong genome configuration (ssAAV or scAAV) was selected during order submission in your Dashboard.
      Submitting under the wrong service type means that your data will be processed under the wrong analysis pipeline. Please contact support@plasmidsaurus.com to request that we re-analyze your data through the correct pipeline.

    For best results, please carefully adhere to our AAV Sample Prep Instructions.

    If your AAV sample fails (i.e. we are not able to generate at least one consensus/assembly from your sample), you can contact us at support@plasmidsaurus.com to inquire whether the extracted yield of AAV DNA was sufficient to repeat sequencing. Please note that because we extract your entire AAV sample on the first attempt, AAV samples are typically not eligible for reruns.

    Custom Sequencing Service


    Yes, with our Custom Sequencing Service we can provide full-length sequencing of ANY linear or circular DNA, for any double-stranded DNA molecules between 100 bp and 300 kb in length! Custom Sequencing allows us to collect the specific amount of data you need in order to achieve your experimental objectives, and we are also able to use a higher-accuracy basecalling model than with our other regular services. Custom Sequencing is ideal for expected mixtures of molecular species (such as barcode or variant libraries) or eukaryotic genomes that require large amounts of data.

    Single-stranded DNA is not currently a supported application for custom sequencing service. Some customers do send ssDNA to this service, but the results are highly variable and we cannot guarantee success; if you opt to submit ssDNA, please be aware this would be at your own risk, and please let us know during order set-up.

    Ready to get started? Email us at support@plasmidsaurus.com to provide all your sample details and set up your custom project.


    We sequence each sample with Oxford Nanopore long reads to collect the amount of data that you specifically request:

      We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry.
      • For circular input DNA, we use sequence-independent linearization.
      • For linear input DNA, we use sequence-independent end-ligation.
      • For genomic input DNA, we use sequence-independent tagmentation that minimally fragments the DNA (unless you specifically request that we use end-ligation instead).
      We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells. Please note that we ONLY deliver the raw reads ( in .fastq.gz format) for this service, unless we specifically agree to perform analysis during project set-up.

    Custom sequencing samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.


    In the vast majority of cases, we deliver custom sequencing results within 3-5 business days of receipt of your samples. Projects that require very large amounts of data may take longer, due to more instrument time needed for data collection and processing. Optionally adding DNA extraction, Illumina services, or bioinformatics will also extend turnaround times accordingly.


    For most custom projects, we deliver only the raw reads in .fastq.gz format. Any analyses (demultiplexing customer’s internal barcodes, generating consensus, binning or aligning variants, etc.) must typically be performed by the researcher, unless we specifically agree to perform analysis during project set-up.


    We will collect the specific amount of data that you request during project set-up. If your samples do not meet all of our QC requirements, this will reduce our ability to achieve the data target, but we may still need to charge you for the work performed.


    We require a minimum raw read Qscore of 10 (90% accuracy) during sequencing, although most raw reads are above Q20 (99% accuracy). We are also able to use a higher-accuracy basecalling model than with our Whole Plasmid and Linear/PCR services.

    Since we typically do not perform any further analysis of the raw reads for custom sequencing, the final accuracy of your own analysis will depend on your analysis pipeline and quality filtering.


    Glad you asked! Email us at support@plasmidsaurus.com to provide all your sample details and set up your custom project.


    The cost for each custom sequencing project starts at $500 for up to 1 Gb of total raw data, then adds $50 for each additional 1 Gb. If you submit more than 1 sample in a project and they need to be multiplexed, we add a $50 per-sample barcoding surcharge. We calculate your project price as follows and will send you a price estimate (and custom quote if you need it) when you email us to discuss your project:

    Project Cost = $500 base price for 1st Gb data + $50 for each extra Gb data + $50 for barcoding each sample

    FOR VARIANT LIBRARIES:
    Total Data Required
    = Number of samples x Insert length x Number of variants (barcodes, mutants, etc.) x Coverage required per variant

    FOR GENOMIC SEQUENCING:
    Total Data Required
    = Number of samples x Expected genome size x Coverage required per genome

    If you need an official quote, just ask us for one during order set-up.