Bioinformatics


Start here for a filtered table containing only bioinformatics FAQ from the master document. Easily searchable with CMD+F on a Mac or CTRL+F on a PC.

If I used two panels on one sample, how can I run the pipeline analysis?

There are two options: you can either use a combined reference file with duplicated reference sequences removed or specify both references in the pipeline run.

If I am working with a species other than mouse or human, where can I download the reference file?

You can download reference FASTA and GTF reference files from Ensembl or NCBI. Please make sure to generate STAR indexed reference before running the pipeline.

 

I knocked in some genes to my organism, how can I modify my reference genome to run the pipeline?

The v1.9.1 pipeline now supports supplemental reference FASTA files, which allow alignment to transgenes, like viral RNA or GFP. You can add transgene sequences with sequence headers and sequences to a FASTA file. On Seven Bridges, the FASTA file of the transgene can be uploaded and included as a part of the pipeline under Supplemental Reference. Locally, the FASTA file can be added to file path under Supplemental_Reference to the yml file.

How can I get the .FASTA reference file for my BD® AbSeq Panel?

Refer to the BD Single-Cell Multiomics Analysis Setup User Guide or use our online BD® AbSeq Panel generator.

Why are there two subsample settings in the pipeline? What is the difference between these two settings?

The input field in Subsample_Settings is used to perform subsampling on total reads including Sample Tag, AbSeq, and mRNA reads, while the one in Multiplex_Options is only for multiplexed samples run and is used when you want to subsample a specific fraction of a Sample Tag library.

How can I download the bioinformatics software from Seven Bridges for local use?

You cannot download the software from Seven Bridges. Please follow the BD Single-Cell Multiomics Analysis Setup User Guide on installing the pipeline for local use.

 

What is the computing requirement for running the pipeline locally?

Refer to the BD Single-Cell Multiomics Analysis Setup User Guide. The local install is intended for the bioinformatics core with servers set up.

How long does pipeline analysis usually take?

On Seven Bridges Genomics, about 3 hours and 40 minutes for ~165 million reads for mRNA-only runs.

What are RSEC and DBEC?

They are UMI adjustment algorithms to remove the effect of UMI errors on molecule counting. MI errors that are single-base substitution errors are identified and adjusted to the parent MI barcode using RSEC. Other MI errors derived from library preparation steps or sequencing base deletions are later adjusted using distribution-based error correction (DBEC). Refer to the BD Single-Cell Multiomics Bioinformatics Handbook for more information.

What is meant by sequencing depth?

Reads with the same cell label, same UMI sequence and same gene, are collapsed into a single raw molecule. The number of reads associated with each raw molecule is reported as the raw adjusted sequencing depth.

When should RSEC and DBEC tables be used?

The usage depends on the sequencing depth.

Scenario 1: If the RSEC sequencing depth is very low (<2), it is likely that all genes do not pass threshold for DBEC (check the metric summary .csv for the number of pass genes). If no genes pass, the DBEC table count is the same as RSEC table count; either can be used. However, we recommend using just the RSEC table.

Scenario 2: If the RSEC sequencing depth is between 2 and 6, some genes pass threshold for the DBEC and some don't. We recommend using either of the following options:

a. Use the DBEC table but remove the genes that are not passing the threshold for cross-sample comparison.

b. Use the RSEC table but keep in mind that there is some level of noise molecules. If cross-sample comparison is necessary, consider only samples with similar sequencing depth or else the level of noise molecules can be very different (since the noise-molecule level increases with increasing depth and the noise molecules remain until DBEC is applied).

Scenario 3: If RSEC sequencing depth >6 that is, sequencing approaches saturation and most genes pass DBEC, then we recommend using the DBEC table.

For targeted assays, why do you recommend a depth of 6 for RSEC? 

We recommend meeting the requirement for the RSEC sequencing depth of ≥6 to reach the threshold of sequencing saturation where most molecules of the library have been recovered.

How many reads and molecules are removed by RSEC?

RSEC does not filter out reads. The reads get collapsed into the parent molecular index and each contribute to the MI coverage per molecule.

RSEC will reduce the number of observed molecular indices by collapsing indices with an edit distance of 1. The number of molecules removed by RSEC are reflected in the UMI_Adjusted_Stat table, on a per gene basis. Compare the counts Raw_Molecules and RSEC_Adjusted_Molecules.

How many reads and molecules are removed by DBEC?

The number of reads removed by DBEC is reflected in the metric Pct_Cellualr_Reads_With_Amplicons_Retained_By_DBEC.

DBEC removes molecules with low depth. The number of molecules removed by DBEC is reflected in the UMI_Adjusted_Stat table, on a per-gene basis. Compare the counts Raw_Molecules, RSEC_Adjusted_Molecules and DBEC_Adjusted_Molecules.

I got an RSEC depth of <6. What should I do? Should I sequence further?

This depends on the application. If it is for simple cell type clustering, additional sequencing may not be necessary. If comparison across multiple samples is necessary, one should consider sequencing further to reach RSEC >6.

See the following recommendations:

Mean RSEC depth <2:

No genes pass the threshold for DBEC. Counts are the same as in the RSEC (edit distance corrected) table. This is acceptable for simple cell type clustering, cell type proportion calculation and identifying significantly expressed genes.

2< Mean RSEC depth <6:

Due to the difference in PCR efficiency in different primers, some genes will have a higher depth than others, and some will reach the depth required for DBEC and some will not. If the genes with inconsistent MI adjustment status are associated with a large number of reads, it can lead to an apparent batch effect when plotted on tSNE. If comparison across samples is necessary, use the DBEC molecule tables but remove genes that have not passed in any of the samples. Alternatively, you may also use RSEC molecule tables.

Mean RSEC depth > 6:

Most genes pass (or are not detected) for DBEC and almost 100% of reads are associated with pass genes. It is acceptable to compare expression profiles across samples. The number of molecules detected approaches saturation. Use DBEC molecule tables.

For Research Use Only. Not for use in diagnostic or therapeutic procedures.

23-22981-00

© Mac is a trademark of Apple Inc.