Release Notes

v2.3 - March 10, 2025

Added

New .CELLSIMO output file - for use in BD Cellismo™ Data Visualization Tool. (Renamed and replaces the .H5MU output file)
Support for BAM file index for chromosomes longer than 500Mb, with .bam.csi

Updated

ATAC index read minimum length changed from 43 to 35 bases
Renamed TCR/BCR metadata column from High_Quality_Cell to High_Qualty_Cell_TCR_BCR
Sample tag read start maximum position value to match prior pipeline versions
Make Rhapsody Reference tool to always include a 'gene_biotype' attribute
ATAC-Seq trimming of custom capture sequence improved to resolve edge cases in sequencing length
Improved logic for automatic paring of FASTQ filenames when header data is not formatted as expected
Seurat output file now includes additional metadata for sample tags and bioproduct stats

Fixed

Pct CellLabel UMI Aligned Uniquely metric now correctly reports aligned and unique, versions 2.0->2.2.1 were reporting only Pct CellLabel UMI Aligned.
Exact cell count parameter did not work for ATAC-Seq only or joint mRNA-ATAC cell calling
ATAC_Compile_Results node fails if custom Rhapsody reference did not have 'gene_biotype' GTF attribute
VDJ_Compile_Results node fails when there are zero cells detected
Custom Rhapsody Reference files for ATAC-Seq may fail if name ends with characters 'a' or 'n'
ATAC-Seq pipeline failure on certain AMD EPYC processors due to BWA-mem2 binary selection
ATAC-Seq pipeline failure in edge case where joint cell calling has no cell intersection
Pipeline report for ATAC-Seq run may report the wrong number of putative cells called
Pipeline failure caused by JSON parse on certain OS locales
Generate_Seurat node fails if there is only one AbSeq reference input
Failure in ATAC-Only pipeline run with sample tag when cell calling data has not been set
Extra Utility AnnotateCellLabelUMI fails if Run_Name parameter not provided

v2.2.1 - June 05, 2024

Added

Support for MGI sequencer FASTQ read header and file names
For TCR/BCR assay, new output of a compressed bundle of PNG images showing the per chain VDJ DBEC algorithm thresholds
Long read support - added new pipeline parameter for enabling support for long R2 reads (>650 bp) - default is to auto-detect long or short reads

Updated

Ability to customize STAR and BWA-MEM2 alignment parameters, enabled on Seven Bridges and local runs

Fixed

Failures that could occur with MGI sequencer FASTQ files in the QualCLAlign or AlignmentAnalysis nodes
Failure caused by VDJ AssembleAndAnnotate node - Argument list too long
Failure caused by MergeBam node - Argument list too long
PhiX aligned reads could incorrectly be counted as targeted mRNA reads when using a targeted panel (usually these are already removed during FASTQ generation)

v2.2 - April 19, 2024

Added

Added support for ATAC-Seq Assay and Multiomic ATAC-Seq Assay (WTA+ATAC-Seq)
Added ability to customize STAR alignment parameters

Updated

Updated the immune cell type classifier to be more lenient in the percentage of bioproducts required to run
Updated TCR BCR annotation software IGBlast to version 1.22
Updated TCR BCR annotation to IMGT release 202349-3 (12-06-2023)
Updated bead version detection

Fixed

Fixed error in dimensionality reduction when zero variable genes are found due to very sparse data

v2.1 - Nov 10, 2023 (Internal and early access release only)

Added

Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
Added support for Enhanced Cell Capture Beads V3

Updated

Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot respectively
Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR’s definition update in v1.4.1
Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
Updated TCR/BCR contig_id to correspond with annotated chain type
Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
Updated RAM requirements for VDJ_preprocess_reads on local server runs
Updated error handling and reporting in read processing steps
Updated logging to capture errors during alignment with STAR
Updated FASTQ handling to skip reads with empty sequence
Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
Updated bead version detection and read trimming

Fixed

Fixed issue that caused failure when a gene symbol was named 'nan'
Fixed issue with a quote mark in a gene symbol causing a failure in the Seurat output file generation
Fixed rare division by zero issue in DBEC algorithm
Fixed rare issue caused by including "SampleTag" in the Run_Name parameter

Experimental

Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9

make_rhapsody_reference tool:

Added an 'Extra_STAR_params' input to enable passing parameters to the STAR genomeGenerate process
Updated to automatically generate a GTF for sequences added in the 'Extra_sequences' FASTA input -- useful for transgenes

v2.0 - June 14, 2023

Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools.
Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc)
Consolidated previously separate WTA and Targeted pipelines into one pipeline
New updated WTA reference combines STAR index and matching GTF
Built-in support for creating a new WTA reference with paired genome FASTA and GTF
New Maximum_Threads parameter to limit the CPU usage on local server runs
Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
BAM files are not generated by default, but can be created using the Generate_Bam parameter
Numerous other fixes and optimizations

v1.12.1 - March 14, 2023

Fix TCR pairing percent metrics

v1.12 - Feb 21, 2023

Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
Added support for Flex SMK (Sample Multiplexing Kit) allowing 24 species and cell type agnostic sample tags
Upgraded CWL to version 1.2
VDJ nodes are only executed when necessary
Pipeline Report: Added cell label graph when an exact count is specified
Added option to skip creating BAM file output
Use productive status when collapsing chains for the VDJ perCell output file
Dominant Contigs AIRR file now have DBEC filtering applied and are uncompressed. Both AIRR files have an additional column cell_type_experimental. The non-AIRR Dominant/Unfiltered files are no longer part of the pipeline output.
Prioritize IG/TR gene features when annotating reads from a VDJ assay

v1.11.1 - Dec 15, 2022

Improved speed and disk usage of AnnotateReads step
Update Pandas version to fix error: ValueError: Unstacked DataFrame is too big, causing int32 overflow
Better prediction of RAM requirements
Improved basic and refined putative cell calling algorithms
Deletion of unnecessary intermediate files to save disk space
Seven Bridges deployment: Fix for error Instance not available for automatic scheduling

v1.11 - Aug 18, 2022

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
Added new "Alignment Categories" for TCR and BCR reads
Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
Improved flexibility for FASTQ file naming
Updated Picard to version 2.27.4
Updated bead version detection

v1.10.1 - April 14, 2022

Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index.
Fixed issue with processing FASTQ files whose filenames end in fq.gz

v1.10 - January 24, 2022

BD Rhapsody Targeted Analysis Pipeline and BD Rhapsody WTA Analysis Pipeline:

Updated VDJ pipeline with improved performance, new assembly algorithm, new metrics and new output files containing all available contig sequences
Added support for Rhapsody Enhanced Beads, with automatic bead version detection
Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
Added Alignment Categories section to metric summary which provides a breakdown of alignments for read pairs with a valid cell label and UMI
Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
Renamed various metrics in outputs to reflect multiomics nature of data (Target Type -> Bioproduct_Type, Gene/Target -> Bioproduct)
Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
Improved support for larger runs on SBG
Updated workflow on SBG to improve editing of resource requirements
Optimized pipeline metadata handling
Improved checking of reference files

v1.9.1 - October 6, 2020

BD Rhapsody™ WTA Analysis Pipeline:

Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases

v1.9 - July 29, 2020

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
Optimized pipeline in various steps for memory and storage usage
Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric

BD Rhapsody™ Targeted Analysis Pipeline:

Support for BD Rhapsody™ VDJ CDR3 protocol
Read and molecule counts for targets from same gene symbol are combined in the output tables
Updated Bowtie2 alignment parameters for improved sensitivity

BD Rhapsody™ WTA Analysis Pipeline:

Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
Updated STAR alignment parameters for improved sensitivity

v1.8 - Oct 4, 2019

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

Added Sample_Tag_ReadsPerCell.csv to Multiplex Output
Optimized pipeline in various steps for memory usage
Fixed bug in status determination for UMI_Adjusted_Stats.csv file

BD Rhapsody™ Targeted Analysis Pipeline:

Updated Targets section in Metrics_Summary.csv to calculate metrics based on targets detected in putative cells only
Removed Clustering Analysis and outputs

BD Rhapsody™ WTA Analysis Pipeline:

Added support for BD™ AbSeq libraries
Removed Targets section in Metrics_Summary.csv for WTA only libraries
Removed Pct_Error_Reads and Error_Depth in UMI_Adjusted_Stats.csv, which are not applicable to WTA only libraries

v1.7.1 - August 7, 2019

Added BD Rhapsody™ WTA Analysis Pipeline
Fixed bug that can cause stalling when zero putative cells were identified
Fixed bug that affected runs using Disable Refined Putative Cell Calling option

v1.6.1 - July 2, 2019

Increased memory limits for GetDataTable and Metrics
Fixed bug associated with "No Multiplex" option on SBG
Uses fewer resources in AddToSam step.

v1.6 - June 10, 2019

Added new options for putative cell determination:
- Exact Cell Count: Set a specific number of cells as putative, based on those with the highest error-corrected read count
- Disable Refined Putative Cell Calling: Determine putative cells using only the basic algorithm
Updated to Python 3
Updated alignment defaults (minor molecule count changes expected)
Local install only - CWL files are bundled into one file

v1.5 - March 14, 2019

Added support for BD Single-cell multiplexing kit: Mouse Immune
Updated various filtering thresholds to support sequencing runs with shorter read length
Deprecated pipeline input: BAM input
Fixed bug in Quality Filter (minor metrics changes expected)
Optimized pipeline (computationally faster, more scalable to support larger input data size, and better logging)

v1.3 - July 31, 2018

Added support for BD™ AbSeq assay
Added support for BD™ single-cell multiplexing kit - Mouse Immune
New pipeline input - AbSeq Reference
New pipeline outputs - Unfiltered cell-gene data tables
Updated Metrics_Summary.csv to support metrics from multiple sequencing libraries
Updated Recursive Substitution Error Correction (RSEC) algorithm (minor molecule count changes expected)
Optimized pipeline to run faster

v1.02 - Nov 27, 2017

Added support for BD Single-cell multiplexing kit - Human
Improved pipeline speed by deleting large temp files
Removed network requirement when running locally
bug fix for the wrong docker image name - Dec 13, 2017

BD Rhapsody™ Sequence Analysis Pipeline 2.3