Release Notes


v3.0 - Oct 29, 2025

Added

  • ATAC: Gene Activity output - new modality in the Cellismo output file and also a separate MEX output file. Gene activity is a Gene-by-Cell matrix, where counts are number of transposase cut sites in the gene body or 2000 bases upstream of the gene start position.
  • ATAC: Transcription factor motif output - new modality in the Cellismo output file and also a separate MEX output file. This is a TFmotif-by-cell matrix, where values are z-scores of the enrichment of each TF motif
  • VDJ: New assembly algorithm improves speed of this step by up to 23 fold (range 7x-23x), enabling the processing of billions of TCR/BCR reads. Metrics are generally equivalent or slightly better.
  • VDJ: VDJ only pipeline - able to provide only TCR and/or BCR FASTQs and get a cell call and VDJ results. Sample multiplexing with VDJ only is also supported. VDJ in combination with a mRNA assay is still recommended for better cell calling and identification
  • New pipeline node to downsample data to calculate a sequencing saturation curve and median genes per cell curve, which are output on the pipeline report
  • Make Rhapsody Reference tool: added an optional input for Transcription Factor Motif PFM file
  • Make Rhapsody Reference tool: will now filter out readthrough transcripts and genes with only readthrough transcripts. Added optional parameter to turn off this filtering
  • Make Rhapsody Reference tool: added optional parameter to filter out Y chromosome Pseudo-Autosomal Regions from Human reference build 38
  • Pipeline Report: new Read Flow diagram, showing a sankey diagram of read filtering steps for each library and for each of the RNA and/or ATAC modalities
  • Pipeline Report: new Sequencing Saturation calculator to enable calculation of required total reads to achieve a target saturation value

Updated

  • VDJ: _VDJ_perCell.csv file CDR3 columns are updated to use CDR3 junction instead of CDR3 alone, resulting in the inclusion of canonical amino acids
  • VDJ: _VDJ_perCell.csv file added full length pairing columns
  • VDJ: New column in AIRR outputs "junction_anchored_aa" - A direct translation of only the CDR3 nucleotide sequence, not influenced by upstream frameshifts
  • VDJ: Update constant region gene identification to prevent mismatched chain types
  • VDJ: Removed PyIR wrapper and call IgBlast directly
  • Basic putative cell calling algorithm updated to fix several edge cases and get more precise cell calls. Increase in putative cell number of ~1% is typical. Use of the Expected Cell Count parameter is highly encouraged
  • Pipeline Report: various metric alert updates
  • Pipeline Report: Mean bioproducts per cell added to summary section
  • Gene expression _MolsPerCell MEX output now contains Ensembl IDs as well as Gene symbols
  • Improved library name determination from FASTQ file names
  • More aggressive cleanup of polyA sequence in reads to prevent spurious alignments
  • Make Rhapsody Reference tool: Extra Sequence input is now included in the BWA-Mem index
  • Seven Bridges CWL: Instance types updated to be more performant, and increase size of instances for ATAC related nodes
  • ATAC peak annotation now uses transcript features rather than gene features, which better classifies peaks when a gene has multiple transcription start sites
  • Cellismo output file now contains GTF data for genes
  • Dimensionality reduction threshold updates: Below 100,000 cells, both t-SNE and UMAP coordinates are generated. Between 100,000 and 300,000 cells, only UMAP coordinates. Above 300,000 cells, a sub-sample of 300,000 cells will be selected and UMAP coordinates generated

Fixed

  • AlignmentAnalysis node was not getting an early cell count estimate, which could cause downstream node scaling issues
  • TCR/BCR node failure when the number of valid TCR or BCR reads exceeded 2,147,483,647 reads
  • Pipeline Report error when exact cell count parameter specified
  • Pipeline Report error when a CITE-seq/AbSeq only datasets are run
  • Targeted RNA pipeline did not output a DBEC MEX file
  • ATAC pipeline could get stuck in QualCLAlign_ATAC for some reference genomes with large numbers of contigs
  • Rare issue where an ATAC peak could exceed the length of the contig on which it resides
  • Improved handling chromosome names with unexpected characters
  • Failure in GenerateSeurat node when there is only 1 AbSeq input
  • Rare failure cause by poor quality read 1 data creating a race condition
  • Rare failure in ATAC node caused by incorrect BWA-MEM2 binary selection
  • ATAC pipeline failure when more than one ATAC library was present in the pipeline inputs
  • ATAC pipeline failure when using sample tags or an "Extra seqs" input
  • ATAC pipeline discrepancy in putative cell numbers in different output files

v2.3 - March 10, 2025

Added

  • New .CELLSIMO output file - for use in BD Cellismo™ Data Visualization Tool. (Renamed and replaces the .H5MU output file)
  • Support for BAM file index for chromosomes longer than 500Mb, with .bam.csi

Updated

  • ATAC index read minimum length changed from 43 to 35 bases
  • Renamed TCR/BCR metadata column from High_Quality_Cell to High_Qualty_Cell_TCR_BCR
  • Sample tag read start maximum position value to match prior pipeline versions
  • Make Rhapsody Reference tool to always include a 'gene_biotype' attribute
  • ATAC-Seq trimming of custom capture sequence improved to resolve edge cases in sequencing length
  • Improved logic for automatic paring of FASTQ filenames when header data is not formatted as expected
  • Seurat output file now includes additional metadata for sample tags and bioproduct stats

Fixed

  • Pct CellLabel UMI Aligned Uniquely metric now correctly reports aligned and unique, versions 2.0->2.2.1 were reporting only Pct CellLabel UMI Aligned.
  • Exact cell count parameter did not work for ATAC-Seq only or joint mRNA-ATAC cell calling
  • ATAC_Compile_Results node fails if custom Rhapsody reference did not have 'gene_biotype' GTF attribute
  • VDJ_Compile_Results node fails when there are zero cells detected
  • Custom Rhapsody Reference files for ATAC-Seq may fail if name ends with characters 'a' or 'n'
  • ATAC-Seq pipeline failure on certain AMD EPYC processors due to BWA-mem2 binary selection
  • ATAC-Seq pipeline failure in edge case where joint cell calling has no cell intersection
  • Pipeline report for ATAC-Seq run may report the wrong number of putative cells called
  • Pipeline failure caused by JSON parse on certain OS locales
  • Generate_Seurat node fails if there is only one AbSeq reference input
  • Failure in ATAC-Only pipeline run with sample tag when cell calling data has not been set
  • Extra Utility AnnotateCellLabelUMI fails if Run_Name parameter not provided

v2.2.1 - June 05, 2024

Added

  • Support for MGI sequencer FASTQ read header and file names
  • For TCR/BCR assay, new output of a compressed bundle of PNG images showing the per chain VDJ DBEC algorithm thresholds
  • Long read support - added new pipeline parameter for enabling support for long R2 reads (>650 bp) - default is to auto-detect long or short reads

Updated

  • Ability to customize STAR and BWA-MEM2 alignment parameters, enabled on Seven Bridges and local runs

Fixed

  • Failures that could occur with MGI sequencer FASTQ files in the QualCLAlign or AlignmentAnalysis nodes
  • Failure caused by VDJ AssembleAndAnnotate node - Argument list too long
  • Failure caused by MergeBam node - Argument list too long
  • PhiX aligned reads could incorrectly be counted as targeted mRNA reads when using a targeted panel (usually these are already removed during FASTQ generation)

v2.2 - April 19, 2024

Added

  • Added support for ATAC-Seq Assay and Multiomic ATAC-Seq Assay (WTA+ATAC-Seq)
  • Added ability to customize STAR alignment parameters

Updated

  • Updated the immune cell type classifier to be more lenient in the percentage of bioproducts required to run
  • Updated TCR BCR annotation software IGBlast to version 1.22
  • Updated TCR BCR annotation to IMGT release 202349-3 (12-06-2023)
  • Updated bead version detection

Fixed

  • Fixed error in dimensionality reduction when zero variable genes are found due to very sparse data

v2.1 - Nov 10, 2023 (Internal and early access release only)

Added

  • Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
  • Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
  • Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
  • Added support for Enhanced Cell Capture Beads V3

Updated

  • Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
  • Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot respectively

  • Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
  • Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
  • Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR’s definition update in v1.4.1
  • Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
  • Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
  • Updated TCR/BCR contig_id to correspond with annotated chain type

  • Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
  • Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
  • Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
  • Updated RAM requirements for VDJ_preprocess_reads on local server runs
  • Updated error handling and reporting in read processing steps
  • Updated logging to capture errors during alignment with STAR
  • Updated FASTQ handling to skip reads with empty sequence
  • Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
  • Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
  • Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
  • Updated bead version detection and read trimming

Fixed

  • Fixed issue that caused failure when a gene symbol was named 'nan'
  • Fixed issue with a quote mark in a gene symbol causing a failure in the Seurat output file generation
  • Fixed rare division by zero issue in DBEC algorithm
  • Fixed rare issue caused by including "SampleTag" in the Run_Name parameter

Experimental

  • Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9

make_rhapsody_reference tool:

  • Added an 'Extra_STAR_params' input to enable passing parameters to the STAR genomeGenerate process
  • Updated to automatically generate a GTF for sequences added in the 'Extra_sequences' FASTA input -- useful for transgenes

v2.0 - June 14, 2023

  • Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
  • New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools.
    Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc)
  • Consolidated previously separate WTA and Targeted pipelines into one pipeline
  • New updated WTA reference combines STAR index and matching GTF
  • Built-in support for creating a new WTA reference with paired genome FASTA and GTF
  • New Maximum_Threads parameter to limit the CPU usage on local server runs
  • Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
  • New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
  • BAM files are not generated by default, but can be created using the Generate_Bam parameter
  • Numerous other fixes and optimizations

v1.12.1 - March 14, 2023

  • Fix TCR pairing percent metrics

v1.12 - Feb 21, 2023

  • Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
  • Added support for Flex SMK (Sample Multiplexing Kit) allowing 24 species and cell type agnostic sample tags
  • Upgraded CWL to version 1.2
  • VDJ nodes are only executed when necessary
  • Pipeline Report: Added cell label graph when an exact count is specified
  • Added option to skip creating BAM file output
  • Use productive status when collapsing chains for the VDJ perCell output file
  • Dominant Contigs AIRR file now have DBEC filtering applied and are uncompressed. Both AIRR files have an additional column cell_type_experimental. The non-AIRR Dominant/Unfiltered files are no longer part of the pipeline output.
  • Prioritize IG/TR gene features when annotating reads from a VDJ assay

v1.11.1 - Dec 15, 2022

  • Improved speed and disk usage of AnnotateReads step
  • Update Pandas version to fix error: ValueError: Unstacked DataFrame is too big, causing int32 overflow
  • Better prediction of RAM requirements
  • Improved basic and refined putative cell calling algorithms
  • Deletion of unnecessary intermediate files to save disk space
  • Seven Bridges deployment: Fix for error Instance not available for automatic scheduling

v1.11 - Aug 18, 2022

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

  • Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
  • By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
  • Added new "Alignment Categories" for TCR and BCR reads
  • Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
  • For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
  • Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
  • Improved flexibility for FASTQ file naming
  • Updated Picard to version 2.27.4
  • Updated bead version detection

v1.10.1 - April 14, 2022

  • Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index.
  • Fixed issue with processing FASTQ files whose filenames end in fq.gz

v1.10 - January 24, 2022

BD Rhapsody Targeted Analysis Pipeline and BD Rhapsody WTA Analysis Pipeline:

  • Updated VDJ pipeline with improved performance, new assembly algorithm, new metrics and new output files containing all available contig sequences
  • Added support for Rhapsody Enhanced Beads, with automatic bead version detection
  • Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
  • Added Alignment Categories section to metric summary which provides a breakdown of alignments for read pairs with a valid cell label and UMI
  • Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
  • Renamed various metrics in outputs to reflect multiomics nature of data (Target Type -> Bioproduct_Type, Gene/Target -> Bioproduct)
  • Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
  • Improved support for larger runs on SBG
  • Updated workflow on SBG to improve editing of resource requirements
  • Optimized pipeline metadata handling
  • Improved checking of reference files

v1.9.1 - October 6, 2020

BD Rhapsody™ WTA Analysis Pipeline:

  • Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
  • Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases

v1.9 - July 29, 2020

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

  • Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
  • Optimized pipeline in various steps for memory and storage usage
  • Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric

BD Rhapsody™ Targeted Analysis Pipeline:

  • Support for BD Rhapsody™ VDJ CDR3 protocol
  • Read and molecule counts for targets from same gene symbol are combined in the output tables
  • Updated Bowtie2 alignment parameters for improved sensitivity

BD Rhapsody™ WTA Analysis Pipeline:

  • Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
  • Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
  • Updated STAR alignment parameters for improved sensitivity

v1.8 - Oct 4, 2019

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

  • Added Sample_Tag_ReadsPerCell.csv to Multiplex Output
  • Optimized pipeline in various steps for memory usage
  • Fixed bug in status determination for UMI_Adjusted_Stats.csv file

BD Rhapsody™ Targeted Analysis Pipeline:

  • Updated Targets section in Metrics_Summary.csv to calculate metrics based on targets detected in putative cells only
  • Removed Clustering Analysis and outputs

BD Rhapsody™ WTA Analysis Pipeline:

  • Added support for BD™ AbSeq libraries
  • Removed Targets section in Metrics_Summary.csv for WTA only libraries
  • Removed Pct_Error_Reads and Error_Depth in UMI_Adjusted_Stats.csv, which are not applicable to WTA only libraries

v1.7.1 - August 7, 2019

  • Added BD Rhapsody™ WTA Analysis Pipeline
  • Fixed bug that can cause stalling when zero putative cells were identified
  • Fixed bug that affected runs using Disable Refined Putative Cell Calling option

v1.6.1 - July 2, 2019

  • Increased memory limits for GetDataTable and Metrics
  • Fixed bug associated with "No Multiplex" option on SBG
  • Uses fewer resources in AddToSam step.

v1.6 - June 10, 2019

  • Added new options for putative cell determination:
    • Exact Cell Count: Set a specific number of cells as putative, based on those with the highest error-corrected read count
    • Disable Refined Putative Cell Calling: Determine putative cells using only the basic algorithm
  • Updated to Python 3
  • Updated alignment defaults (minor molecule count changes expected)
  • Local install only - CWL files are bundled into one file

v1.5 - March 14, 2019

  • Added support for BD Single-cell multiplexing kit: Mouse Immune
  • Updated various filtering thresholds to support sequencing runs with shorter read length
  • Deprecated pipeline input: BAM input
  • Fixed bug in Quality Filter (minor metrics changes expected)
  • Optimized pipeline (computationally faster, more scalable to support larger input data size, and better logging)

v1.3 - July 31, 2018

  • Added support for BD™ AbSeq assay
  • Added support for BD™ single-cell multiplexing kit - Mouse Immune
  • New pipeline input - AbSeq Reference
  • New pipeline outputs - Unfiltered cell-gene data tables
  • Updated Metrics_Summary.csv to support metrics from multiple sequencing libraries
  • Updated Recursive Substitution Error Correction (RSEC) algorithm (minor molecule count changes expected)
  • Optimized pipeline to run faster

v1.02 - Nov 27, 2017

  • Added support for BD Single-cell multiplexing kit - Human
  • Improved pipeline speed by deleting large temp files
  • Removed network requirement when running locally
  • bug fix for the wrong docker image name - Dec 13, 2017