BD Rhapsody™ Analysis Pipeline Updates


  • v2.1 BD Rhapsody™ Sequence Analysis Pipeline Release date : Nov 10, 2023

    • Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
    • Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
    • Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
    • Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9
    • Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
    • Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot respectively
    • Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
    • Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
    • Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR's definition update in v1.4.1
    • Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
    • Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
    • Updated TCR/BCR contig_id to correspond with annotated chain type
    • Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
    • Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
    • Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
    • Updated RAM requirements for VDJ_preprocess_reads on local server runs
    • Updated error handling and reporting in read processing steps
    • Updated logging to capture errors during alignment with STAR
    • Updated FASTQ handling to skip reads with empty sequence
    • Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
    • Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
    • Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
    • Updated bead version detection and read trimming
    • Other bug fixes
  • v2.0 BD Rhapsody™ Sequence Analysis Pipeline
    • Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
    • New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools. Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc)
    • Consolidated previously separate WTA and Targeted pipelines into one pipeline
    • New updated WTA reference combines STAR index and matching GTF
    • Built-in support for creating a new WTA reference with paired genome FASTA and GTF
    • New Maximum_Threads parameter to limit the CPU usage on local server runs
    • Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
    • New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
    • BAM files are not generated by default, but can be created using the Generate_Bam parameter
    • Numerous other fixes and optimizations
  • v1.12.1 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline

    • Fixed TCR pairing percent metrics

  • v1.12 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline
    • Added support for Flex Sample Multiplexing Kit
    • Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
    • New input option: Generate_Bam - Added option to skip creating BAM file output
    • Upgraded CWL to version 1.2 (for local runs, may require update of cwltool > 3.0.20200807)
    • Pipeline Report: Added cell label graph when an exact count is specified
    • TCR/BCR: nodes are only executed when necessary
    • TCR/BCR: Use productive status to inform consolidating chains for the VDJ perCell output file
    • TCR/BCR: Dominant Contigs AIRR file now has DBEC filtering applied and is uncompressed
    • TCR/BCR: Both AIRR files have an additional column: cell_type_experimental
    • TCR/BCR: The non-AIRR Dominant/Unfiltered contig files are no longer part of the pipeline output (content was redundant with AIRR files)
    • TCR/BCR: R2 alignment analysis will prioritize IG/TR gene features when TCR/BCR is enabled
       
  • v1.11.1 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline
    • Improved speed and disk usage of AnnotateReads step
    • Update Pandas version to fix error: “ValueError: Unstacked DataFrame is too big, causing int32 overflow”
    • Better prediction of RAM requirements
    • Improved basic and refined putative cell calling algorithms
    • Deletion of unnecessary intermediate files to save disk space
    • Seven Bridges deployment: Fix for error “Instance not available for automatic scheduling”
  • v1.11 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:

    • Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
    • By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
    • Added new "Alignment Categories" for TCR and BCR reads
    • Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
    • For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
    • Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
    • Improved flexibility for FASTQ file naming
    • Updated Picard to version 2.27.4
    • Updated bead version detection
  • v1.10.1 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
    • Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index
    • Fixed issue with processing FASTQ files whose filenames end in “fq.gz”
  • v1.10 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
    • Updated VDJ pipeline with support for full-length contig assembly, improved performance, new metrics, and new output files containing contig sequences

    • Added support for Rhapsody Enhanced Beads, with automatic bead version detection
    • Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
    • Added “Alignment Categories” section to metric summary which provides a breakdown of the types of alignments seen for each library
    • Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
    • Renamed various metrics in outputs to reflect multi-omics nature of data (Target Type → Bioproduct_Type, Gene/Target → Bioproduct)
    • Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
    • Improved support for larger runs on SBG
    • Optimized pipeline metadata handling
    • Improved checking of reference files
  • v1.9.1 of the BD Rhapsody™ Analysis Pipeline for improved WTA Analysis Released:
    • Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
    • Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases
  • v1.9 BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline Released:
    • Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
    • Optimized pipeline in various steps for memory and storage usage
    • Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric
    • BD Rhapsody™ Targeted Analysis Pipeline:
      • Support for BD Rhapsody™ VDJ CDR3 protocol
      • Read and molecule counts for targets from same gene symbol are combined in the output tables
      • Updated Bowtie2 alignment parameters for improved sensitivity
    • BD Rhapsody™ WTA Analysis Pipeline:
      • Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
      • Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
      • Updated STAR alignment parameters for improved sensitivity