BD Rhapsody™ Analysis Pipeline Updates

In this section

Articles in this section

BD Rhapsody™ Analysis Pipeline Updates

v2.1 BD Rhapsody™ Sequence Analysis Pipeline Release date : Nov 10, 2023
- Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
- Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
- Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
- Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9
- Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
- Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot respectively
- Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
- Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
- Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR's definition update in v1.4.1
- Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
- Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
- Updated TCR/BCR contig_id to correspond with annotated chain type
- Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
- Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
- Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
- Updated RAM requirements for VDJ_preprocess_reads on local server runs
- Updated error handling and reporting in read processing steps
- Updated logging to capture errors during alignment with STAR
- Updated FASTQ handling to skip reads with empty sequence
- Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
- Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
- Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
- Updated bead version detection and read trimming
- Other bug fixes
v2.0 BD Rhapsody™ Sequence Analysis Pipeline
- Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
- New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools. Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc)
- Consolidated previously separate WTA and Targeted pipelines into one pipeline
- New updated WTA reference combines STAR index and matching GTF
- Built-in support for creating a new WTA reference with paired genome FASTA and GTF
- New Maximum_Threads parameter to limit the CPU usage on local server runs
- Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
- New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
- BAM files are not generated by default, but can be created using the Generate_Bam parameter
- Numerous other fixes and optimizations

v1.12.1 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline
- Fixed TCR pairing percent metrics

v1.12 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline
- Added support for Flex Sample Multiplexing Kit
- Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
- New input option: Generate_Bam - Added option to skip creating BAM file output
- Upgraded CWL to version 1.2 (for local runs, may require update of cwltool > 3.0.20200807)
- Pipeline Report: Added cell label graph when an exact count is specified
- TCR/BCR: nodes are only executed when necessary
- TCR/BCR: Use productive status to inform consolidating chains for the VDJ perCell output file
- TCR/BCR: Dominant Contigs AIRR file now has DBEC filtering applied and is uncompressed
- TCR/BCR: Both AIRR files have an additional column: cell_type_experimental
- TCR/BCR: The non-AIRR Dominant/Unfiltered contig files are no longer part of the pipeline output (content was redundant with AIRR files)
- TCR/BCR: R2 alignment analysis will prioritize IG/TR gene features when TCR/BCR is enabled
v1.11.1 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline
- Improved speed and disk usage of AnnotateReads step
- Update Pandas version to fix error: “ValueError: Unstacked DataFrame is too big, causing int32 overflow”
- Better prediction of RAM requirements
- Improved basic and refined putative cell calling algorithms
- Deletion of unnecessary intermediate files to save disk space
- Seven Bridges deployment: Fix for error “Instance not available for automatic scheduling”
v1.11 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
- Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
- By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
- Added new "Alignment Categories" for TCR and BCR reads
- Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
- For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
- Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
- Improved flexibility for FASTQ file naming
- Updated Picard to version 2.27.4
- Updated bead version detection
v1.10.1 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
- Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index
- Fixed issue with processing FASTQ files whose filenames end in “fq.gz”
v1.10 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
- Updated VDJ pipeline with support for full-length contig assembly, improved performance, new metrics, and new output files containing contig sequences
- Added support for Rhapsody Enhanced Beads, with automatic bead version detection
- Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
- Added “Alignment Categories” section to metric summary which provides a breakdown of the types of alignments seen for each library
- Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
- Renamed various metrics in outputs to reflect multi-omics nature of data (Target Type → Bioproduct_Type, Gene/Target → Bioproduct)
- Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
- Improved support for larger runs on SBG
- Optimized pipeline metadata handling
- Improved checking of reference files
v1.9.1 of the BD Rhapsody™ Analysis Pipeline for improved WTA Analysis Released:
- Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
- Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases
v1.9 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
- Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
- Optimized pipeline in various steps for memory and storage usage
- Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric
- BD Rhapsody™ Targeted Analysis Pipeline:
  - Support for BD Rhapsody™ VDJ CDR3 protocol
  - Read and molecule counts for targets from same gene symbol are combined in the output tables
  - Updated Bowtie2 alignment parameters for improved sensitivity
- BD Rhapsody™ WTA Analysis Pipeline:
  - Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
  - Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
  - Updated STAR alignment parameters for improved sensitivity