Results Interpretation Guide

Whole Plasmid, ZeroPrep, & RCA Sequencing Services

Lots of reasons!

Scientific rigor and peace of mind.
E. coli and other hosts will go to great lengths to avoid expressing your leaky toxic gene, including modifying your plasmid in unexpected ways that are invisible to targeted Sanger sequencing.
Plasmid inserts are getting longer and more complex. Instead of multiple Sanger runs or synthesizing a sequencing primer or doing primer walking, sequence the whole plasmid.
Long reads are ideal for resolving repetitive regions that stymie Sanger sequencing.
Are you sure your plasmid isn't a dimer? Are you sure there aren't multiple plasmids in your strain? Sanger sequencing won't tell you, and we see it all the time.
It's neither much more expensive nor slower.

We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including linearization of the circular input DNA in a sequence-independent manner.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
We generate a high-accuracy circular consensus sequence from the raw reads.
For standard size plasmids, we will also return a set of feature annotations.

Plasmid and circular samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

In the vast majority of cases, we deliver plasmid sequencing results within one business day of receipt of your samples.

This service is intended for a clonal population of molecules. You can send mixtures of molecular species, but since we can't predict the analysis outcome, it's at your own risk.

If your species are very similar (e.g. differ by only a few nucleotides), the pipeline will most likely create a single .gbk consensus file, with mixed peaks observed in the .ab1 file at SNP/indel locations.
If your species are sufficiently distinct (e.g. vastly different in size or sequence), the pipeline will generate a single consensus sequence for the molecular species that produces the largest amounts of total sequencing data. (Note that concatemer forms such as dimers, trimers, etc. are not considered different molecular species by the pipeline, so you will only receive the monomer consensus sequence by default).

Ultimately, which species ends up producing a consensus will vary depending on overall sample quality, coverage, and relative abundance/degradation of each species.

Sequencing is considered successful if the pipeline is able to generate any consensus, even if it is not your target. Re-sequencing mixtures won't change the relative proportions of the species (and thus which species generates a consensus), but you can submit multiple aliquots if you need higher overall coverage.

If you'd like to sequence a known mixture (e.g. barcode or variant libraries), please consider submitting instead to our Custom Sequencing Service.

As per Oxford Nanopore's specs for the chemistry and flowcells we currently use for plasmid sequencing, the consensus accuracy is typically >99.99%.

sample .fastq file — This figure shows an example .ab1 file indicating overlapping peaks for A and G at position 371 due to conflicting basecalls at a GATC methylation site, resulting in a lower confidence basecall. However, in this example, the consensus did correctly call this base as G (no sequencing error, just lower confidence).

We do not guarantee any specific level of coverage, as the number of raw reads generated can vary substantially depending on sample quality.

Successful samples sent at the required concentration typically yield in the high dozens to hundreds (or thousands!) of raw sequencing reads.

Average coverage is reported in the SAMPLE_summary.tsv file. Coverage over ~20x indicates a very accurate consensus.

Consensus sequence (.fasta file): Provides the polished consensus sequence of the plasmid, generated from the raw reads.
Consensus sequence (.gbk file): Provides the polished consensus sequence of the plasmid, generated from the raw reads. Also includes a plasmid map and feature annotations from the excellent pLannotate tool from the Barrick Lab:
McGuffie,M.J. and Barrick,J.E. (2021) pLannotate: engineered plasmid annotation. Nucleic Acids Research DOI: 10.1093/nar/gkab374
Plasmid map (.html file): An interactive version of the pLannotate plasmid map.
Read length histogram (.png file): Displays the read length distribution of the raw reads produced by your sample, thereby providing unique insight into the contents of your samples. See more details about how to interpret your histograms below.
Virtual gel (.png file): Displays the raw read lengths from all samples in the order in a virtual gel format, resembling what you’d see if you ran the DNA fragments on a gel.
Chromatogram (.ab1 file): Displays the relative abundance of each nucleotide (A, T, G, C) for all raw reads that align to the consensus at each position of the sequence. See more details about how to interpret your chromatograms below.
Coverage plot (.png file): Displays the relative sequencing coverage at each position of the consensus sequence. A region with a large gap or a sudden large increase suggests either an assembly issue or a mixture of multiple plasmid species.
Per-base data (.txt and .tsv files): Includes 3 sub-files for each sample:

SAMPLE.tsv: Indicates how well the raw reads agree with the consensus sequence at each position. The list includes the consensus basecalls at each position, along with number of total raw reads aligning at that position and the basecall distributions in the raw reads for that position (A, T, G, C, matches, mismatches, insertions, deletions, etc.).

SAMPLE_multimer_analysis.txt: Indicates the % distribution of the various concatemer forms of the consensus sequence (monomer, dimer, trimer, etc.).

SAMPLE_summary.tsv: Indicates the length, average coverage, relative composition (by moles and mass), total reads, total bases, and %. E. coli genomic DNA contamination for the consensus sequence.

Raw read sequences (.fastq.gz file): Provides the sequences of individual raw reads that align to the consensus. Please note that these reads are NOT delivered in the default download, but can be downloaded separately by clicking the "Download Raw FASTQ" button at the top of the "Order Information" page. Note that any raw reads that do not align to the consensus (e.g. host genomic DNA, lower abundance molecular species) are excluded.
FASTQ is a sequence format similar to FASTA, with additional Phred quality score information that can displayed graphically by most modern sequence viewers. Since each basecalled position only has one quality score, certain sequence features, such as insertions or deletions, must be inferred from looking at adjacent bases.

Our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the linear/PCR DNA sent to us, so we do not guarantee results. If we are not able to generate a consensus sequence from your sample, our failure policy applies.

The histogram displays the lengths of the raw reads produced by your sample, with read length (bp) on the x-axis and thousands of bases of data collected (kb) at that length on the y-axis. The histogram is therefore weighted by amount of sequencing data produced by different sizes of molecules; for example, two DNA fragments of different lengths that produce the same number of reads will produce different amounts of total data.

The x-axis is automatically scaled to the maximum read length produced by your sample. Before sequencing your plasmids, we linearize them so that we get mostly full-length sequence reads. As a result, the lengths of the raw sequencing reads reflect the lengths of the molecular species in your sample.

Additionally, the histogram color key indicates what fraction of the raw data maps to the consensus sequence:

The data from a raw read is colored as...	If...
Dark blue (ASSEMBLY read)	Raw read aligns to the consensus/assembly sequence
Orange (E. COLI read)	Raw read aligns to the E. coli genome
Light blue (UNMAPPED read)	Raw read does not align to any of these categories (Could be sequencing noise, a genome other than E. coli, a lower abundance plasmid species that does not generate a consensus, etc.)

Ideally, your target plasmid will be the only species in the sample, and we will see one dominant peak in the read length histogram:

histo-one-species — A dominant peak (~4,800 bp in this case) typically suggests a clean prep with a single plasmid.

(Please note that even a single apparent peak MAY contain multiple plasmids of the same size, or multiple plasmids of different lengths that happen fall into the same histogram bin. Sequences that are very similar are assumed by the analysis pipeline to be variations of a single species and it will attempt to make a single consensus (with potentially low confidence positions reported); if the sequences are very distinct, it will only produce a consensus for the most abundant species.)

If your raw reads contain varying numbers of indels (common for noisy raw reads), this may sometimes cause the read lengths to straddle a bin boundary and artifactually create an appearance of two separate peaks:

histo-double-peak — These reads most likely all come from a single plasmid (~2,500 bp in this case), but varying numbers of insertion and deletion sequencing errors result in different lengths that cause them to straddle a bin boundary.

(Please note that a peak straddling a bin boundary MAY contain multiple plasmids of the same size, or multiple plasmids of different lengths that happen fall into two adjacent histogram bins. Sequences that are very similar are assumed by the analysis pipeline to be variations of a single species and it will attempt to make a single consensus (with potentially low confidence positions reported); if the sequences are very distinct, it will only produce a consensus for the most abundant species.)

More often than you would expect, though, we see multiple peaks corresponding to multiple plasmids, or a peak of a different size than the customer expected:

histo-multi-species-1 — Uh oh! Good thing you did whole plasmid sequencing, Sanger sequencing might not have shown you all these plasmid species!

histo-multi-species-2 — This sample contains 3 unique plasmid species, only 2 of which (~8,600 bp and ~4,200bp bp -- corresponding to the target plasmid and the empty vector, respectively) yielded enough coverage to produce a consensus.

If you sample contains a mixture, we will return only a single consensus for the molecular species that produces the largest amount of total sequencing data. If you’d like us try generating a consensus for an alternate peak instead, you can email us at support@plasmidsaurus.com to inquire.

Occasionally we see a sample with a dominant peak in addition to an abundance of degraded DNA (genomic and/or plasmid). In some cases the dominant peak may still produce a consensus, if read coverage and accuracy are sufficient:

histo-one-with-degradation — This sample produced a consensus for the ~14,000 bp peak, with the degraded plasmid fragments contributing to its coverage.

Sometimes we see a decent number of reads for the sample but there is NO dominant peak, indicating an abundance of degraded DNA (genomic and/or plasmid) from a poor plasmid prep, or that the strain contains no plasmids:

histo-no-peak — No dominant peak was observed in this sample, despite high read count. No consensus was generated.

Often, the read count is too low to distinguish any peaks or to generate any consensus:

histo-low-read-count — If read count is too low, usually it is because samples are not prepared at the required concentration.

We see concatemers like this all the time -- they are not a sequencing artifact. Sanger sequencing can't detect them and you won't see them on gel of your digested/linearized plasmid, so you're not used to seeing them, but they turn out to be very common. If you run your sample uncut on a gel with a supercoiled ladder, you will see the concatemer band.

They often seem to be formed in vivo during growth in a RecA+ strain (such as NEB Turbo cells), and are more common when plasmids have large repetitive regions or other complex structures. Even plasmid manufacturers like Addgene observe that concatemers occur frequently, and that only the long-read sequencing technologies like the one we use here Plasmidsaurus (that is, Oxford Nanopore Technologies) can detect them!

Please note that concatemer forms such as dimers, trimers, etc. are not considered different molecular species by the pipeline, so you will only receive the monomer consensus sequence by default, even if other concatemer forms produced more sequencing data.

**Fig1.** This histogram shows that most of the data is produced by the monomer, but these is also a small amount of data from the dimer.

virtualgel — **Fig2.** Each sample shown in this virtual gel displays two distinct bands, one for the monomer and one for the dimer.

The .ab1 format has widespread use in Sanger sequencing and normally indicates the intensity of fluorescent nucleotides (A, T, G, C) at each position of the consensus. Since fluorescence is not employed in the Oxford Nanopore Sequencing technology that we use here at Plasmidsaurus, we generate this .ab1 file synthetically using the relative abundance of each nucleotide (A, T, G, C) from the raw reads at each position of the consensus sequence. Because this file type was originally used for Sanger sequencing (which is limited to much shorter read lengths than we get with Oxford Nanopore), the file has a maximum size limit and therefore we must often report sequences in multiple pieces.

This file gives a visual representation of polymorphisms and molecular mixtures present in the sample, and putative insertions can be observed more clearly. An ideal high accuracy basecall will have a sharp, distinct peak of a single color indicating a single nucleotide, whereas a low accuracy basecall will have a less defined or mixed (overlapping) peaks.

For plasmids, "failure" means that your sample did not produce data of sufficient quality and quantity for the pipeline to generate a consensus sequence.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your plasmid samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

Samples are not prepared at the required DNA concentration.
The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent.
You may see evidence of this failure mode in the low amount of total data reported in the raw read length histogram and in the low consensus coverage reported in the SAMPLE_summary.tsv file.
Samples contain a mixture of plasmid species and/or fragmented genomic DNA or fragmented plasmids.
You may see evidence of this failure mode in a wide range of read lengths reported in the raw read length histogram.

To achieve optimal sequencing results, please follow our recommended plasmid sample prep instructions, ZeroPrep cell prep instructions, or RCA sample prep instructions

It is relatively rare that we cannot return a consensus sequence, but some rate of failure is unavoidable. You are welcome to submit a rerun request for any failed plasmid samples through your Order Info page (please note that ZeroPrep and RCA Sequencing Services are NOT eligible for reruns). We will evaluate whether your plasmid sample quality and quantity permits rerunning your sample (we may also ask you to provide a reference sequence). We do still charge for failed samples.

When you upload a reference sequence, we use Minimap to determine the alignment of your assembly to the reference sequence you provided.

If your sample is a “likely match”, this means that all the mismatches in your sample fall into a pattern common to ONT sequencing artifacts. These are DNA methylation sites, and insertions and deletions in long runs of the same nucleotide (homopolymers, e.g. A10->A9).

If your sample is a “mismatch”, it contains mismatches that do not fall into ONT sequencing artifact patterns.

If your sample is “no match”, this means we did not find any alignment between your assembly and your uploaded references.

Premium PCR Sequencing Service

We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including end-ligation of the linear input DNA. Your DNA is not fragmented.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
We generate a high-accuracy linear consensus sequence from the raw reads for the most abundant molecular species.

Premium PCR samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

In the vast majority of cases, we deliver Premium PCR sequencing results within one business day of receipt of your samples.

Premium PCR is more expensive ($30 per sample) to run than the regular Linear/PCR service, but is ideal in the following scenarios:

If your linear DNA sample is NOT clonal, but rather contains a mixture of different molecules (e.g. barcode or variant libraries) that you would like to fully characterize.
If you require full-length, end-to-end reads that are sequenced without any fragmentation.
If you require a larger number of sequencing reads beyond the typical yield from the regular Linear/PCR Service. Standard Premium PCR typically yields about 3,000 raw reads per sample, Big Premium PCR typically yields about 6,000 raw reads per sample, and Huge Premium PCR typically yields about 12,000 raw reads per sample, for DNA up to 25 kb in length.
If you require obtaining ALL raw reads produced by your sample, rather than just the raw reads that align to your consensus as with the regular Linear/PCR Service. (Please note that returning ALL raw reads means there is a low level chance of demultiplexing error, so a few reads from your sample might be returned to another customer on the same sequencing run.)

Yes! Just note that by default, we will generate only one high-accuracy linear consensus sequence for the single most abundant molecular species in your sample. If your sample contains multiple molecular species, you may perform your own analyses on the raw reads that are included with your results, or email support@plasmidsaurus.com to ask for additional consensus sequences to be generated.

If you need more than 12,000 reads to fully characterize your molecular mixture, you can submit instead to our Custom Sequencing Service where we can obtain as much data as you specifically require.

As per Oxford Nanopore's specs for the chemistry and flowcells we currently use for Premium PCR sequencing, the consensus accuracy is typically >99.99%.

For samples sent at the correct concentration, we typically collect about 3,000 raw sequencing reads for Standard Premium PCR sequencing, about 6,000 raw sequencing reads for Big Premium PCR sequencing, and about 12,000 raw sequencing reads for Huge Premium PCR sequencing.

When your results are ready, you will receive an email notification. Once you sign in to your account, you can download these results from your Dashboard.

Consensus sequence (.fasta file): Polished consensus sequence of the most abundant molecular species in your sample.
Consensus sequence (.gbk file): Polished consensus sequence of the most abundant molecular species in your sample, with a feature map and annotations. Annotations are generated using the excellent pLannotate tool from the Barrick Lab.
Molecular map (.html file): An interactive version of the feature map. Note that this map will be depicted as circular, but the bold black bar at position 1 indicates that it is indeed linear.
Read length histogram (.png file): Displays the read length distribution of the raw reads produced by your sample. This distribution provides insight into the relative abundance and length of different DNA molecules in your sample.
Virtual gel (.png file): Displays the raw read lengths from all samples in the same order in a virtual gel format.
Chromatogram (.ab1 file): Displays the relative abundance of each nucleotide (A, T, G, C) for all raw reads that align to the consensus at each position of the sequence. See more details about how to interpret your chromatograms below.
Coverage plot (.png file): Displays the relative sequencing coverage at each position of the consensus sequence. A region with a large gap or a sudden large increase suggests either an assembly issue or a mixture of multiple DNA species.
Per-base data (.tsv files): Indicates how well the raw reads agree with the consensus sequence at each position.
Summary file (.txt file): Includes (a) the % distribution of the various concatemer forms of the consensus sequence (monomer, dimer, trimer, etc.) by moles and mass and (b) %. E. coli genomic DNA contamination for the consensus sequence.

Raw read sequences (.fastq.gz file): Provides the sequences of all raw reads produced by your sample. Please note that returning all raw reads means there is a small chance of demultiplexing error, so a few reads from your sample might be returned to another customer on the same sequencing run.
FASTQ is a sequence format similar to FASTA, with additional Phred quality score information that can be displayed graphically by most modern sequence viewers.

For Premium PCR samples, "failure" means that your sample did not produce data of sufficient quality and quantity for the pipeline to generate any consensus sequence from your mixture.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your Premium PCR samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

You submitted unpurified DNA.
Unpurified samples (with DNA still intermixed with the original reaction reagents) are more likely to fail. To save you time, Plasmidsaurus can purify your samples for a small additional fee - simply select the cleanup option on the order page.
Samples are not prepared at the required DNA concentration. The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent. You may see evidence of this failure mode in the low amount of total data reported in the read length histogram or in the small number of raw .fastq reads received.
Samples contain a mixture of DNA species, but none of them obtained sufficient coverage on its own to produce a consensus.

To achieve optimal sequencing results, please follow our recommended Premium PCR sample prep instructions.

It is relatively rare that we cannot return a consensus sequence, but some rate of failure is unavoidable. We do still charge for failed samples.

If your purified DNA sample failed to produce a consensus, but you received MORE THAN 2,000 full-length raw reads, this suggests that your sample contains a diverse mixture of products and/or the DNA is fragmented. You can email us at support@plasmidsaurus.com to request assembly of a different molecule in the mixture, with a limit of up to 3 additional assemblies per sample. We may also ask you to provide a reference sequence.
If your purified DNA sample failed to produce a consensus and you received LESS THAN 2,000 full-length raw reads, you are welcome to submit a rerun request through your Order Info page. We will evaluate whether your sample quality and quantity permits rerunning your sample (we may also ask you to provide a reference sequence).

Bacterial Genome Sequencing Service

We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including minimal fragmentation of the input genomic DNA in a sequence-independent manner.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
We produce a high-quality genome assembly (see How is the bacterial genome assembly generated?)..
We produce a set of bacterial genome annotations with Bakta (delivered in various file formats).

Bacterial DNA samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

When you send pre-extracted gDNA, we deliver bacterial genome sequencing results within 1-2 business days of receipt of your samples. When our genomic DNA extraction option is selected, we deliver bacterial genome sequencing results within 3-5 business days of receipt of your samples. When the Hybrid sequencing option is selected, we deliver hybrid results with 6-8 business days (or 8-10 business days if you also include the extraction option).

We require a minimum raw read Qscore of 10 (90% accuracy) during sequencing, although most raw reads are above Q20 (99% accuracy). We are also able to use a higher-accuracy basecalling model on these raw reads than with our Whole Plasmid and Linear/PCR services.

During assembly, we filter the reads for quality as described below. If sufficient coverage to meet our target is obtained, we typically see assembled contigs with ~Q40 (99.99%) accuracy.

We can obtain even higher accuracy in the known error-prone homopolymers and methylated motifs with our Hybrid sequencing option that polishes with Illumina data..

The most common error modes for Oxford Nanopore are deletions in homopolymer stretches (especially if longer than 8 bp), errors at the Dam methylation site GATC, and errors at the middle position of the Dcm methylation site CCTGG or CCAGG. These limitations are expected to improve with future updates to ONT sequencing chemistry and basecalling software. If you know that you need single-nucleotide accuracy in your assembly for these regions, please consider submitting to the Hybrid sequencing option to polish out those errors with Illumina data.

Successful sequencing is defined by achieving at least one of the following deliverables:

A high-quality genome assembly

Target amount of raw data

210 Mb of raw sequencing data for the "standard" service (i.e. 30x genome coverage of a single 7 Mb genome)
360 Mb of raw sequencing data for the "big" service (i.e. 30x genome coverage of a single 12 Mb genome)

If you select the Hybrid sequencing option we target the same amount of ONT data listed above, PLUS an equal amount of Illumina short-read data:

Target amount of raw Illumina data

210 Mb of raw Illumina data for the "standard" service (i.e. 30x genome coverage of a single 7 Mb genome)
360 Mb of raw Illumina data for the "big" service (i.e. 30x genome coverage of a single 12 Mb genome)

However, our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the gDNA that is sent to us. We do not guarantee any specific output.

Remove the bottom 5% worst fastq reads via Filtlong v0.2.1 (default parameters)
Downsample the reads to 250 Mb via Filtlong to create a rough sketch of the assembly with Miniasm v0.3
Using information acquired from the Miniasm assembly, re-downsample the reads to ~100x coverage (do nothing if there isn't at least 100x coverage) with heavy weight applied to removing low quality reads (helps small plasmids stick around)
Run a Flye v2.9.1 assembly with parameters selected for high quality ONT reads
Polish Flye assembly via Medaka v1.8.0 using the reads generated in step 3
Run several analyses:

annotation
Bakta v1.6.1
contig analysis
Bandage v0.8.1
genome completeness and contamination
CheckM v1.2.2
species / plasmid identification
Mash v2.3 against RefSeq genomes+plasmids
Sourmash v4.6.1 against GenBank
CheckM v1.2.2

Polish the ONT .fna assembly with Illumina .fastq reads using Polypolish v0.6.0, which yields a new .fasta polished hybrid assembly file

FASTA files:

.fna (contig nucleotide sequences) = polished consensus sequence of the genome
.faa (protein amino acid sequences)
.ffn (gene nucleotide sequences)

GenBank files:

.gbff (annotated contig sequences) = polished and annotated consensus sequence of the genome

FASTQ file:

.fastq.gz = a compressed file of all the raw ONT sequencing reads

Report:

.html = An analytical report of the key metrics for the assembly (including completeness of the assembly based on CheckM v1.2.2and general species identification of the contigs)

Various other Bakta annotation files

If you select the Hybrid sequencing option, we deliver the same files listed above for your ONT-only assembly, PLUS we deliver the following additional files:

FASTQ file:

.fastq.gz = a compressed file of all the raw Illumina sequencing reads

FASTA file:

.fasta polished hybrid assembly file, made by polishing the .fna ONT assembly with Illumina .fastq sequence reads

For the Hybrid service, we do not repeat genome assembly or contig identification after Illumina polishing, as Illumina reads are only used for resolving SNPs and other errors (they are too short to affect contiguity). The contiguity of the assembly is entirely determined by the longer ONT reads, and therefore the quality of your gDNA.

If we are not able to achieve at least one of the target deliverables, then we will repeat sequencing as per our bacterial failure repeat policy.

Even when a high-quality assembly cannot be generated, we still provide the raw data and the report, and you may also still receive some of the other file types.

Although we do not provide definitive reasons on why each specific sample failed (or had low coverage), by far the most common reasons are:

Your samples are not shipped at the required DNA concentration of 50 ng/uL.
The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent fluorometric assay.
The gDNA in your samples is degraded or fragmented.
At least 50% of the DNA should be above 15kb in length, and samples should be handled with utmost care:

Pipetting with wide-bore tips
Minimal freeze/thaw cycles
No vortexing
No extreme temperature/pH
No intercalating dyes
No UV radiation
Not over-dried

Your samples contain inhibitors, such as:
- RNA
- Denaturants (guanidinium salts, phenol, etc.)
- Detergents (SDS, Triton-X100, etc.)
- Residual contaminants from the organism/tissue (heme, humic acid, polyphenols, polysaccharides, lipids, etc.)
- Insoluble, colored, or cloudy material
- Other inhibitors (EDTA, etc.)
The DNA you sent is not from a single bacterial isolate.
This service is intended for a clonal population (single species) of bacteria. If your sample contains a mixture of different bacterial species, it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information.
For the extraction option: You did not ship us the required number of bacterial cells, or the cells you shipped did not come from a single bacterial isolate.
This service is intended for a clonal population (single species) of bacteria. If your sample contains a mixture of different bacterial species, it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information. Additionally, please perform a cell count while preparing your preserved cells to confirm that you are sending the number of cells we require.

To increase chances of successful sequencing on the first attempt, please adhere closely to our sample prep guidelines and cell pellet guidelines.

If we are not able to achieve at least one of the target deliverables on the 1st sequencing attempt, we will evaluate the results of the initial sequencing attempt to determine whether additional sequencing may produce a more successful outcome, and if so we will repeat the sequencing (with possible protocol adjustments) at no additional charge. We will also combine the data from the two runs together to increase chances of success on the repeat attempt.

If we are not able to achieve at least one of the target deliverables after the 2nd attempt, we will not perform further repeats. We do still charge for failed samples, since we spend more time and resources on them than we do on successes.

If you wish to sequence the sample again, please prepare new samples that meet all the QC requirements before submitting a new sequencing request.

This service is intended for a clonal population (single species) of bacteria. You can send mixtures of different bacterial species for sequencing, but since we can't predict the assembly outcome, it's at your own risk.

The total amount of raw data obtained for your sample will be divided up between however many species are present in your sample, thereby reducing each species’ own genome coverage and possibly inhibiting assembly of particular species in the sample. Re-sequencing mixtures won't change the relative proportions of the species, but you can submit multiple aliquots if you need higher total coverage. Ultimately, which species end up producing an assembly will vary depending on overall sample quality, coverage, and relative abundance/degradation of each species.

If you require even larger amounts of data for metagenomic applications, please consider submitting instead to your Custom Sequencing Service.

We sequence all molecules in the received sample without primers, so if your extracted bacterial DNA also contains plasmid DNA, then yes you will probably receive some plasmid reads. Most of the sequenced DNA fragments < 3kb are omitted during data processing, but otherwise we do not select against or omit plasmid-sized reads during sequencing or assembly.

The number of raw reads produced by each type of DNA will vary based on their relative abundance and quality. As for assembly outcomes, we do usually see that plasmid contigs are produced along with the gDNA chromosome contig(s) during assembly. However, since this bacterial genome sequencing service is optimized for assembly of the chromosomal genome (not for plasmids), we cannot guarantee that the raw plasmid reads will always yield an assembled plasmid contig. If you do need assemblies for the plasmids, you may need to isolate reads that align to your expected plasmids and assemble them yourself with a different pipeline.

Ultimately, when submitting mixtures, which types of DNA in your sample end up producing an assembled contig will vary depending on overall sample quality, coverage, and relative abundance/degradation of each type.

Yes, we can provide yeast sequencing & assembly through this service! You can submit your purified yeast gDNA (not preserved or live cells, as we are not currently offering yeast extractions) under the "big bacteria" service, then email us at support@plasmidsaurus.com to let us know your 6-character order ID and expected yeast species. We will manually generate yeast annotations and send them to you via email, and you would want to ignore the default bacterial annotations provided by the pipeline.

Yes, any species can technically be sequenced and assembled with this method, but submitting samples for non-microbial applications is at your own risk since we have not optimized the amount of data required for each specimen type, and our assembly/annotation pipeline is targeted for microbes. Further, you might need to submit multiple aliquots of each sample in order to get enough genome coverage, and you would need to combine the data from all your aliquots prior to running your own assembly pipeline.

When larger amounts of data are needed (more than 1 Gb, and up to several Tb), we can sequence your eukaryotic genomes instead through our Custom Sequencing Service! With our Custom service, we can also:

Obtain as much data as you specifically require
Optionally add on Illumina data if you need it for known error-prone motifs
Optionally perform custom genome assembly & annotation for your particular species

If this sounds like a good fit for your project, please review all the information provided on the Custom Sequencing Service, then email as with all the details at support@plasmidsaurus.com to set up your custom project!

Yeast Genome Sequencing Service

We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including minimal fragmentation of the input genomic DNA in a sequence-independent manner.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
We produce a high-quality genome assembly (see How is the yeast genome assembly generated?).
We produce a set of yeast genome annotations with Augustus (delivered in various file formats).

Yeast DNA samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

When you send pre-extracted gDNA, we deliver yeast genome sequencing results within 1-2 business days of receipt of your samples. When our genomic DNA extraction option is selected, we deliver yeast genome sequencing results within 3-5 business days of receipt of your samples.

During assembly, we filter the reads for quality as described below. If sufficient coverage to meet our target is obtained, we typically see assembled contigs with ~Q40 (99.99%) accuracy.

Successful sequencing is defined by achieving at least one of the following deliverables:

A high-quality genome assembly
600 Mb of raw ONT sequencing data (i.e. 30x genome coverage of a single 20 Mb genome)

Remove the bottom 5% worst fastq reads via Filtlong v0.2.1 (with heavy weight applied to removing low quality reads, –qual_weight 10)
Run a Flye v2.9.1 assembly with parameters selected for high quality ONT reads
Polish Flye assembly via Medaka v1.8.0 using the reads generated in step 1
Run several analyses:
- Annotation:
  - Augustus v3.5.0 (best gene model automatically based on closest reference)
  - BLAST v2.15.0 (align ORFs against UniProt database v2024_04, use top hit if evalue <0.05)
- Contig analysis:
  - Bandage v0.8.1
- Genome completeness and contamination:
  - Busco v5.7.1

.fastq.gz = a compressed file of all the raw ONT sequencing reads
.fasta = polished consensus sequence of the genome (may contain multiple contigs)
.gff = gene annotations for the polished genome
.html = A summary report compiling the assembly metrics, including completeness of the assembly based on Busco v5.7.1 and general species identification of the contigs. The metrics summarized in the report are also delivered as discrete files:
- reads.png = histogram of all raw reads (indicating read length vs. Phred score), including coloration to distinguish reads that are retained for assembly vs. reads that are rejected
- stats.tsv = metrics assessing the quality and size of the polished genome assembly and the raw reads that were used for assembly
- busco-short-summary.txt = metrics assessing the completeness of the polished genome assembly
- contigs.png = graph of the contig topology and their connections in the assembly
- contigs.txt = metrics assessing the quantity and lengths of the contigs

If we are not able to achieve at least one of the target deliverables, then we will repeat sequencing as per our yeast failure repeat policy.

Even when a high-quality assembly cannot be generated, we still provide the raw data and the report, and you may also still receive some of the other file types.

Although we do not provide definitive reasons on why each specific sample failed (or had low coverage), by far the most common reasons are:

Your samples are not shipped at the required DNA concentration of 50 ng/uL.
The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent fluorometric assay.
The gDNA in your samples is degraded or fragmented.
At least 50% of the DNA should be above 15kb in length, and samples should be handled with utmost care:

Pipetting with wide-bore tips
Minimal freeze/thaw cycles
No vortexing
No extreme temperature/pH
No intercalating dyes
No UV radiation
Not over-dried

Your samples contain inhibitors, such as:
- RNA
- Denaturants (guanidinium salts, phenol, etc.)
- Detergents (SDS, Triton-X100, etc.)
- Residual contaminants from the organism/tissue (heme, humic acid, polyphenols, polysaccharides, lipids, etc.)
- Insoluble, colored, or cloudy material
- Other inhibitors (EDTA, etc.)
The DNA you sent is not from a single yeast isolate.
This service is intended for a clonal population (single species) of yeast. If your sample contains a mixture of different species, it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information.
For the extraction option: Your species is not on our approved list of yeast genera.
The yeast extraction service uses a rapid, high-throughput protocol that is ONLY COMPATIBLE with the above approved list of yeast genera. If you send cells for extraction from any genus that is not listed above, or cells of any other types of microbe (such as filamentous fungi), they are likely to fail and will not be eligible for rerun.
For the extraction option: You did not ship us the required number of yeast cells, or the cells you shipped did not come from a single yeast isolate.
Please perform a cell count while preparing your preserved cells to confirm that you are sending the number of cells we require. Additionally, this service is intended for a clonal population (single species) of yeast. If your sample contains a mixture of different yeast species, it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information

To increase chances of successful sequencing on the first attempt, please adhere closely to our sample prep guidelines and yeast cell pellet guidelines.

If we are not able to achieve at least one of the target deliverables on the 1st sequencing attempt, we will evaluate the results of the initial sequencing attempt and the quality/quantity of your DNA to determine whether additional sequencing may produce a more successful outcome, and if so we will repeat the sequencing (with possible protocol adjustments) at no additional charge. We will also combine the data from the two runs together to increase chances of success on the repeat attempt.

If you wish to sequence the sample again, please prepare new samples that meet all the QC requirements before submitting a new sequencing request.

This service is intended for a clonal population (single species) of yeast. You can send mixtures of different yeast species for sequencing, but since we can't predict the assembly outcome, it's at your own risk.

If you require even larger amounts of data for metagenomic applications, please consider submitting instead to your Custom Sequencing Service.

We sequence all molecules in the received sample without primers, so if your extracted yeast DNA also contains plasmids or YACs, then yes you will probably receive some raw sequencing reads for those molecules. Most of the sequenced DNA fragments < 3kb are omitted during data processing, but otherwise we do not select against or omit plasmid-sized reads during sequencing or assembly.

The number of raw reads produced by each type of DNA will vary based on their relative abundance and quality. As for assembly outcomes, we do usually see that plasmid or YAC contigs are produced along with the gDNA chromosome contigs during assembly. However, since this yeast genome sequencing service is optimized for assembly of the chromosomal genome (not for plasmids or YACs), we cannot guarantee that these raw plasmid reads will always yield an assembled plasmid or YAC contig. If you do need assemblies for the plasmids or YACs, you may need to isolate raw reads that align to your expected references and assemble them yourself with a different pipeline.

Yes, any species can technically be sequenced and assembled with this method, but submitting samples for non-yeast applications is at your own risk since we have not optimized the amount of data required for each specimen type, and our assembly/annotation pipeline is targeted for yeast. Further, you might need to submit multiple aliquots of each sample in order to get enough genome coverage, and you would need to combine the data from all your aliquots prior to running your own assembly pipeline.

When larger amounts of data are needed (e.g. more than 1 Gb per sample), we can sequence your large eukaryotic genomes instead through our Custom Sequencing Service! With our Custom service, we can also:

Obtain as much data as you specifically require
Optionally add on Illumina data if you need it for known error-prone motifs
Optionally perform custom genome assembly & annotation for your particular species

AAV Genome Sequencing Service

This service is performed using the newest long-read sequencing technology from Oxford Nanopore Technologies (ONT), and includes the following components:

We extract whole AAV genomes from your intact viral capsids.
We construct an amplification-free long-read sequencing library using our updated in-house protocol to capture ITR-containing linear ssAAV or scAAV DNA.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (all the raw data produced by your sample is delivered in .fastq format).
We identify and assemble subspecies from the raw sequencing reads to generate high-accuracy linear consensus sequences for all detectable AAV genome subspecies (full-length, truncations, etc.) that comprise at least 1-5% of the total subspecies, depending on the sample. We also deliver metrics on the relative quantification of each viral subspecies and histograms of genome size vs. read count.

AAV samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

In the vast majority of cases, we deliver AAV sequencing results within 3 business days of receipt of your samples.

If your AAV genomes are contained within purified, intact viral capsids (cell-free encapsulated AAV genomes in either ssAAV or scAAV genome configuration), please submit them to this AAV service!

If your AAV genomes are cloned into dsDNA circular plasmids, those can be sequenced through our Whole Plasmid sequencing service instead.

Please contact support@plasmidsaurus.com if you are interested in sequencing pre-extracted AAV DNA rather than the intact capsids that are required for this AAV service.

Yes! We return high-accuracy linear consensus sequences (.fasta) for all detectable AAV genome subspecies (isoforms) that comprise at least 1-5% of the total subspecies. We also provide the .fastq sequences of all raw reads produced by your sample.

As per Oxford Nanopore's specs for the chemistry and flowcells we currently use for AAV sequencing, the consensus accuracy is typically >99.99%. The raw reads from this service are also more accurate than the raw reads from the regular Whole Plasmid sequencing service, with higher per-base confidence.

For intact viral capsids sent at the required concentration, we typically collect 500-1,000 reads.

Consensus sequences (.fasta files): Provides high-accuracy linear consensus sequences for all detectable AAV genome subspecies (full-length, truncations, etc.) that comprise at least 1-5% of the total subspecies, depending on the sample.
Read length histogram (.png file): Displays the read length distribution of all the raw reads produced by your sample, with read length (genome size) vs. number of reads (molecular counts).
Isoform quantifications (.tsv): Indicates relative quantification of each AAV genome subspecies (isoform) as a fraction of total reads obtained.
Raw read sequences (.fastq.gz file): Provides the sequences of all raw reads produced by your sample. Please note that returning all raw reads means there is a small chance of demultiplexing error, so a few reads from your sample might be returned to another customer on the same sequencing run.

Our ability to deliver these target outputs is directly dependent on the quantity, quality, and purity of the viral capsids sent to us, so we do not guarantee results.

For AAV samples, "failure" means that your sample did not produce at least one consensus/assembly.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your AAV samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

Viral capsids are not prepared at the required concentration, or all of the viral capsids we received are empty (contain no AAV genomes).
A commonly used method for AAV capsid quantification (titration) and verification of genome content is digital droplet PCR (ddPCR) (see protocol from Addgene). Please refer to published literature for additional capsid titration protocols.
The viral capsids we received were not sufficiently purified from cell culture.
Please verify that your purified AAV samples contain no remaining host cells or cell lysate. A commonly used method for AAV capsid purification is ultracentrifugation with a cesium chloride (CsCl) density gradient or iodixanol gradient (IOD) (see protocols in Lamla et al, 2015). Please refer to published literature for additional capsid purification protocols.
Your viral genomes are highly truncated. A population of highly truncated genomes will make it difficult to assemble a full-length consensus sequences for each truncated species. If there is little to no full length genome present it may not be possible to create a consensus assembly. The read length histogram can be useful in evaulating these types of problems.

For best results, please carefully adhere to our AAV Sample Prep Instructions.

If your AAV sample fails (i.e. we are not able to generate at least one consensus/assembly from your sample), you can contact us at support@plasmidsaurus.com to inquire whether the extracted yield of AAV DNA was sufficient to repeat sequencing. Please note that because we extract your entire AAV sample on the first attempt, AAV samples are typically not eligible for reruns.

Eukaryotic Genome Sequencing Service

We sequence each sample with Oxford Nanopore (ONT) long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including minimal fragmentation of the gDNA in a sequence independent-manner (via tagmentation). DNA fragments <3kb are depleted using Large Fragment Buffer.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).
We produce a high-quality genome assembly (see How is the eukaryotic genome assembly generated?).
We produce a set of eukaryotic genome annotations with Augustus (delivered in various file formats).

Eukaryotic DNA samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

For pre-extracted gDNA we deliver genome sequencing results within:

2-3 business days for the 1 Gb and 5 Gb Service Tiers
4-6 business days for the 15 Gb and 50-100 Gb Service Tiers

When our eukaryotic DNA extraction option is selected, we deliver genome sequencing results within:

4-6 business days for the 1 Gb and 5 Gb Service Tiers
6-9 business days for the 15 Gb and 50-100 Gb Service Tiers

Higher data targets have longer sequencing and assembly times.

The Eukaryotic Genome Sequencing service is intended for long-read whole-genome sequencing, assembly, and annotation of genomic DNA from any eukaryotic species with a genome size between 20 Mb and 3.3 Gb, with any ploidy. Eukaryotic species that are ideally suited to this genome assembly service include filamentous fungi, protozoans, nematodes, fruit flies, zebrafish, mouse, and human.

The Eukaryotic Genome Sequencing service offers four different Service Tiers (categories), which allow you to select the target amount of raw reads per sample, measured in Gigabases (Gb) of.fastq data. We do not specify an expected number of reads at these Service Tiers, as this will vary depending on the quantity, quality, and purity of the input gDNA.

The Service Tier recommendations below are based on a target of approximately 30x genome coverage, which is typically sufficient to generate a high-quality de novo genome assembly with annotations. Depending on your application, you can submit your sample to a different service tier than is recommended in order to obtain a different amount of coverage. For example:

If you plan to use the raw reads to perform structural variant analysis against a reference genome, you may need less genome coverage, e.g. 10-20x. Please note that assembly may fail at lower coverage levels, but you can ignore the assembly and just use the raw reads to perform your analysis.
If you plan to use the raw reads to generate a haplotype-phased de novo genome assembly, you may need more genome coverage, e.g. 60-100x. Please note that you would need to re-assemble the raw reads on your end to obtain haplotype phasing.

We use ONT’s super-accurate basecalling model during the sequencing run. We require a minimum raw read Qscore of 10 (90% accuracy) during sequencing, although most raw reads are above Q20 (99% accuracy).

During assembly, we filter the reads for quality as described below. If sufficient coverage to meet the target is obtained, we typically see assembled contigs with ~Q40 (99.99%) accuracy.

Remove the bottom 5% worst fastq reads via Filtlong v0.2.1 (with heavy weight applied to removing low quality reads, –qual_weight 10)
Run a Flye v2.9.1 assembly with parameters selected for high quality ONT reads
Polish Flye assembly via Medaka v1.8.0 using the reads generated in step 1
Run several analyses:
- Annotation:
  - Augustus v3.5.0 (best gene model automatically based on closest reference)
  - BLAST v2.15.0 (align ORFs against UniProt database v2024_04, use top hit if evalue <0.05)
- Contig analysis:
  - Bandage v0.8.1
- Genome completeness and contamination:
  - Busco v5.7.1

.fastq.gz = a compressed file of all the raw ONT sequencing reads
.fasta = polished consensus sequence of the genome (may contain multiple contigs)
.gff = gene annotations for the polished genome
.html = A summary report compiling the assembly metrics, including completeness of the assembly based on Busco v5.7.1 and general species identification of the contigs. The metrics summarized in the report are also delivered as discrete files:
- reads.png = histogram of all raw reads (indicating read length vs. Phred score), including coloration to distinguish reads that are retained for assembly vs. reads that are rejected
- stats.tsv = metrics assessing the quality and size of the polished genome assembly and the raw reads that were used for assembly
- busco-short-summary.txt = metrics assessing the completeness of the polished genome assembly
- contigs.png = graph of the contig topology and their connections in the assembly
- contigs.txt = metrics assessing the quantity and lengths of the contigs

If your input has low quality, purity, and/or concentration, thus not meeting our Sample Prep Instructions and Cell Prep Instructions, your samples may fail to assemble or produce low coverage. In cases of failure, you will still receive the raw reads that we can obtain, and potentially some of the other file types.

Although we do not provide definitive reasons on why each specific sample failed (or had low coverage), by far the most common reasons are:

Your samples are not shipped at the required DNA concentration of 50 ng/uL.
The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent fluorometric assay.
The gDNA in your samples is degraded or fragmented.
At least 50% of the DNA should be above 15kb in length, and samples should be handled with utmost care:

Pipetting with wide-bore tips
Minimal freeze/thaw cycles
No vortexing
No extreme temperature/pH
No intercalating dyes
No UV radiation
Not over-dried

Your samples contain inhibitors, such as:
- RNA
- Denaturants (guanidinium salts, phenol, etc.)
- Detergents (SDS, Triton-X100, etc.)
- Residual contaminants from the organism/tissue (heme, humic acid, polyphenols, polysaccharides, lipids, etc.)
- Insoluble, colored, or cloudy material
- Other inhibitors (EDTA, etc.)
The DNA you sent is not from a single eukaryotic isolate.
This service is intended for a clonal population (single species) of eukaryote. If your sample contains a mixture of different species (metagenomic or non-clonal), it may fail to produce an assembly. Please refer to our FAQ on sequencing mixtures for more information.
For the extraction option:
- Your species is not on our approved list of eukaryotic input types. The eukaryotic extraction service uses a rapid, high-throughput protocol that is only compatible with the above list of input types. If you send tissues or cell types for extraction that are not listed above, they are likely to fail and will not be eligible for rerun.
- You did not ship us the required number of eukaryotic cells. Please perform a cell count (or use a pipet) while preparing your preserved cells to confirm that you are sending the number of cells we require.

To increase chances of successful sequencing on the first attempt, please adhere closely to our Eukaryotic Sample Prep Instructions and Cell Prep Instructions.

At the 1 Gb and 5 Gb Service Tiers, poorly performing samples are often rescued with our rerun procedure.

If a minimum of 75% of the data target (i.e 750 Mb and 3.75 Gb respectively) is not obtained on the initial sequencing attempt, samples at the 1 Gb and 5 Gb Service Tiers will be designated “fail” (regardless of assembly outcome) and will be automatically submitted for a free rerun to collect more data.
If the 75% minimum is still not met after two sequencing attempts, the sample will not be eligible for further repeats. If you’d like to collect more data, you would need to submit a new sample with higher quality on a new order.

At the 15 Gb and 50-100 Gb (full flow cell) Service Tiers, poorly performing samples are unlikely to be remedied even with our rerun procedure, so we save you money by stopping the run early.

We will stop the run early to preserve the lifespan of the flow cell. We will not attempt to perform a rerun on the sample.
Instead of the full $1750 flow cell price, you will be charged only $200 to cover our prep costs. If you’d like to try to collect more data, you can promptly request that we restart the run at your own risk (you would pay the full $1750 price regardless of outcome), OR you can submit a new sample with higher quality on a new order.

This service is intended for a clonal population (single species) of eukaryote. You can send mixtures of different eukaryotic species for sequencing, but since we can't predict the assembly outcome, it's at your own risk.

If you require even larger amounts of data for metagenomic (non-clonal) applications, please consider submitting instead to your Custom Sequencing Service.

We sequence all molecules in the received sample and we do not use any primers, so if your extracted DNA also contains DNA from these extrachromosomal molecules, then yes you will probably also receive some raw sequencing reads for those molecules. Please note that DNA fragments <3kb are depleted from the sample during library prep.

The number of raw reads produced by each type of DNA will vary based on their relative abundance and quality in the sample. When assembling contigs from the total raw reads, we do usually see that extrachromosomal contigs are assembled along with the gDNA chromosome contigs. However, since this service is optimized for assembly of the chromosomal genome (not for extrachromosomal DNA), we cannot guarantee that the raw reads produced will always produce an assembled contig. If you do need assemblies, you may need to isolate raw reads that align to your extrachromosomal reference and assemble those reads with a different pipeline.

Ultimately, when submitting DNA mixtures, which types of DNA end up producing an assembled contig will vary depending on overall sample quality, coverage, and relative abundance/degradation of each type. Please note that single-stranded DNA is unlikely to produce sequencing data.

Yes! Our new eukaryotic extraction service uses a rapid, high-throughput protocol that is ONLY COMPATIBLE with cell culture from any animal species that do not have a cell wall or abundant connective tissue.

Please note that your input cells must be preserved in Zymo DNA/RNA shield prior to shipping. See further details in our Cell Prep Instructions.

We can accept Zymo DNA/RNA shield-preserved samples from both BSL1 and BSL2 sources of the above input types. We currently do not accept any other input types (e.g. intact tissues/organs from any animal, plants, insects, fungi, any type of blood) or preservation methods (e.g. flash freezing, ethanol, desiccation, RNAlater) for this extraction service. If you’d like to sequence these other types of samples, please perform extraction on your end and ship us only the purified gDNA for sequencing.

No, other input types are not supported with this eukaryotic genome service. This service is designed using lab workflows, sequencing conditions, and data analyses that are optimized for eukaryotic gDNA input, using long-read whole genome Oxford Nanopore sequencing.

Some examples of other input types that are not supported with this eukaryotic genome service include:

Other types of non-genomic, double-stranded DNA (e.g. plasmid or amplicon libraries)
RNA, single-stranded DNA, self-complementary AAV DNA, or other nucleic acids

No, due to the rapid, high-throughput nature of this eukaryotic genome service, we are not able to make adjustments to the service parameters. If you’d like to request any of the following adjustments:

Sequencing more than 50-100 Gb of data (1 full flow cell) per sample
Sequencing a genome that is larger than 3.3 Gb
Sequencing an intermediate amount of data that does not have a service tier, such as 7 Gb
Splitting the data target per sample between multiple smaller samples
Using ligation-based library prep chemistry (instead of tagmentation)
Sequencing metagenomic (non-clonal) DNA from mixed species samples
Identifying variants between the samples or against a reference

… then please submit your request instead to our Custom Sequencing Service, where as always, we can collect the amount of data you request with the parameters you require!

We hope to offer these eukaryotic genome add-on options in the future.

Yes you can, as both the eukaryotic and yeast assembly services do use the same analysis pipelines. The difference between the two is in how much data you need. For example, for a yeast genome with 20 Mb size:

Submit to Yeast Genome Sequencing Service for approx. 600 Mb of data (30x coverage)
Submit to Eukaryotic Genome Sequencing Service for approx. 1 Gb (50x coverage) or 5 Gb (250x coverage) of data

You could, as the sequencing method will still technically work, but our workflows will not be optimized for bacteria and may fail. We recommend that you submit to the Bacterial Genome Sequencing Service instead!

Our recommended data targets are based on ~30x coverage of the haploid genome size. The assembly pipeline will return a single haplotype sequence for each contig. For diploids and polyploids, it is possible that the haplotype will be collapsed into a single consensus assembly. The assembly results will not be haplotype-phased.

With higher levels of ploidy or heterozygosity in your species, you may need additional coverage (e.g 60-100x or higher) to obtain high-quality assembly results. Please note that you would need to assemble the raw reads on your end to obtain haplotype-phased results.

Custom Sequencing Service

Yes, with our Custom Sequencing Service we can provide full-length sequencing of ANY linear or circular DNA, for any double-stranded DNA molecules between 100 bp and 300 kb in length! Custom Sequencing allows us to collect the specific amount of data you need in order to achieve your experimental objectives, and we are also able to use a higher-accuracy basecalling model than with our other regular services. Custom Sequencing is ideal for expected mixtures of molecular species (such as barcode or variant libraries) or eukaryotic genomes that require large amounts of data.

Single-stranded DNA is not currently a supported application for custom sequencing service. Some customers do send ssDNA to this service, but the results are highly variable and we cannot guarantee success; if you opt to submit ssDNA, please be aware this would be at your own risk, and please let us know during order set-up.

Ready to get started? Email us at support@plasmidsaurus.com to provide all your sample details and set up your custom project.

We sequence each sample with Oxford Nanopore long reads to collect the amount of data that you specifically request:

amplification-free long-read sequencing library

v14 library prep chemistry

For circular input DNA, we use sequence-independent linearization.
For linear input DNA, we use sequence-independent end-ligation.
For genomic input DNA, we use sequence-independent tagmentation that minimally fragments the DNA (unless you specifically request that we use end-ligation instead).

R10.4.1 flow cells

Custom sequencing samples are sequenced WITHOUT primers or amplification. Please do not ship any primers with your samples or mix primers into your samples.

In the vast majority of cases, we deliver custom sequencing results within 3-5 business days of receipt of your samples. Projects that require very large amounts of data may take longer, due to more instrument time needed for data collection and processing. Optionally adding DNA extraction, Illumina services, or bioinformatics will also extend turnaround times accordingly.

For most custom projects, we deliver only the raw reads in .fastq.gz format. Any analyses (demultiplexing customer’s internal barcodes, generating consensus, binning or aligning variants, etc.) must typically be performed by the researcher, unless we specifically agree to perform analysis during project set-up.

We will collect the specific amount of data that you request during project set-up. If your samples do not meet all of our QC requirements, this will reduce our ability to achieve the data target, but we may still need to charge you for the work performed.

We require a minimum raw read Qscore of 10 (90% accuracy) during sequencing, although most raw reads are above Q20 (99% accuracy). We are also able to use a higher-accuracy basecalling model than with our Whole Plasmid and Linear/PCR services.

Since we typically do not perform any further analysis of the raw reads for custom sequencing, the final accuracy of your own analysis will depend on your analysis pipeline and quality filtering.

Glad you asked! Email us at support@plasmidsaurus.com to provide all your sample details and set up your custom project.

The cost for each custom sequencing project starts at $500 for up to 1 Gb of total raw data, then adds $50 for each additional 1 Gb. If you submit more than 1 sample in a project and they need to be multiplexed, we add a $50 per-sample barcoding surcharge. We calculate your project price as follows and will send you a price estimate (and custom quote if you need it) when you email us to discuss your project:

Project Cost = $500 base price for 1st Gb data + $50 for each extra Gb data + $50 for barcoding each sample

FOR VARIANT LIBRARIES:
Total Data Required = Number of samples x Insert length x Number of variants (barcodes, mutants, etc.) x Coverage required per variant

FOR GENOMIC SEQUENCING:
Total Data Required = Number of samples x Expected genome size x Coverage required per genome

If you need an official quote, just ask us for one during order set-up.

Results Interpretation Guide

Jump To Topic:

Whole Plasmid, ZeroPrep, & RCA Sequencing Services

Linear/PCR Sequencing Service

Premium PCR Sequencing Service

Bacterial Genome Sequencing Service

Yeast Genome Sequencing Service

AAV Genome Sequencing Service

Eukaryotic Genome Sequencing Service

Custom Sequencing Service