Release Notes
v3.0 - Oct 29, 2025
Added
- ATAC: Gene Activity output - new modality in the Cellismo output file and also a separate MEX output file. Gene activity is a Gene-by-Cell matrix, where counts are number of transposase cut sites in the gene body or 2000 bases upstream of the gene start position.
- ATAC: Transcription factor motif output - new modality in the Cellismo output file and also a separate MEX output file. This is a TFmotif-by-cell matrix, where values are z-scores of the enrichment of each TF motif
- VDJ: New assembly algorithm improves speed of this step by up to 23 fold (range 7x-23x), enabling the processing of billions of TCR/BCR reads. Metrics are generally equivalent or slightly better.
- VDJ: VDJ only pipeline - able to provide only TCR and/or BCR FASTQs and get a cell call and VDJ results. Sample multiplexing with VDJ only is also supported. VDJ in combination with a mRNA assay is still recommended for better cell calling and identification
- New pipeline node to downsample data to calculate a sequencing saturation curve and median genes per cell curve, which are output on the pipeline report
- Make Rhapsody Reference tool: added an optional input for Transcription Factor Motif PFM file
- Make Rhapsody Reference tool: will now filter out readthrough transcripts and genes with only readthrough transcripts. Added optional parameter to turn off this filtering
- Make Rhapsody Reference tool: added optional parameter to filter out Y chromosome Pseudo-Autosomal Regions from Human reference build 38
- Pipeline Report: new Read Flow diagram, showing a sankey diagram of read filtering steps for each library and for each of the RNA and/or ATAC modalities
- Pipeline Report: new Sequencing Saturation calculator to enable calculation of required total reads to achieve a target saturation value
Updated
- VDJ:
_VDJ_perCell.csvfile CDR3 columns are updated to use CDR3 junction instead of CDR3 alone, resulting in the inclusion of canonical amino acids - VDJ:
_VDJ_perCell.csvfile added full length pairing columns - VDJ: New column in AIRR outputs "junction_anchored_aa" - A direct translation of only the CDR3 nucleotide sequence, not influenced by upstream frameshifts
- VDJ: Update constant region gene identification to prevent mismatched chain types
- VDJ: Removed PyIR wrapper and call IgBlast directly
- Basic putative cell calling algorithm updated to fix several edge cases and get more precise cell calls. Increase in putative cell number of ~1% is typical. Use of the Expected Cell Count parameter is highly encouraged
- Pipeline Report: various metric alert updates
- Pipeline Report: Mean bioproducts per cell added to summary section
- Gene expression
_MolsPerCell MEXoutput now contains Ensembl IDs as well as Gene symbols - Improved library name determination from FASTQ file names
- More aggressive cleanup of polyA sequence in reads to prevent spurious alignments
- Make Rhapsody Reference tool: Extra Sequence input is now included in the BWA-Mem index
- Seven Bridges CWL: Instance types updated to be more performant, and increase size of instances for ATAC related nodes
- ATAC peak annotation now uses transcript features rather than gene features, which better classifies peaks when a gene has multiple transcription start sites
- Cellismo output file now contains GTF data for genes
- Dimensionality reduction threshold updates: Below 100,000 cells, both t-SNE and UMAP coordinates are generated. Between 100,000 and 300,000 cells, only UMAP coordinates. Above 300,000 cells, a sub-sample of 300,000 cells will be selected and UMAP coordinates generated
Fixed
- AlignmentAnalysis node was not getting an early cell count estimate, which could cause downstream node scaling issues
- TCR/BCR node failure when the number of valid TCR or BCR reads exceeded 2,147,483,647 reads
- Pipeline Report error when exact cell count parameter specified
- Pipeline Report error when a CITE-seq/AbSeq only datasets are run
- Targeted RNA pipeline did not output a DBEC MEX file
- ATAC pipeline could get stuck in QualCLAlign_ATAC for some reference genomes with large numbers of contigs
- Rare issue where an ATAC peak could exceed the length of the contig on which it resides
- Improved handling chromosome names with unexpected characters
- Failure in GenerateSeurat node when there is only 1 AbSeq input
- Rare failure cause by poor quality read 1 data creating a race condition
- Rare failure in ATAC node caused by incorrect BWA-MEM2 binary selection
- ATAC pipeline failure when more than one ATAC library was present in the pipeline inputs
- ATAC pipeline failure when using sample tags or an "Extra seqs" input
- ATAC pipeline discrepancy in putative cell numbers in different output files
v2.3 - March 10, 2025
Added
- New .CELLSIMO output file - for use in BD Cellismo™ Data Visualization Tool. (Renamed and replaces the .H5MU output file)
- Support for BAM file index for chromosomes longer than 500Mb, with .bam.csi
Updated
- ATAC index read minimum length changed from 43 to 35 bases
- Renamed TCR/BCR metadata column from High_Quality_Cell to High_Qualty_Cell_TCR_BCR
- Sample tag read start maximum position value to match prior pipeline versions
- Make Rhapsody Reference tool to always include a 'gene_biotype' attribute
- ATAC-Seq trimming of custom capture sequence improved to resolve edge cases in sequencing length
- Improved logic for automatic paring of FASTQ filenames when header data is not formatted as expected
- Seurat output file now includes additional metadata for sample tags and bioproduct stats
Fixed
- Pct CellLabel UMI Aligned Uniquely metric now correctly reports aligned and unique, versions 2.0->2.2.1 were reporting only Pct CellLabel UMI Aligned.
- Exact cell count parameter did not work for ATAC-Seq only or joint mRNA-ATAC cell calling
- ATAC_Compile_Results node fails if custom Rhapsody reference did not have 'gene_biotype' GTF attribute
- VDJ_Compile_Results node fails when there are zero cells detected
- Custom Rhapsody Reference files for ATAC-Seq may fail if name ends with characters 'a' or 'n'
- ATAC-Seq pipeline failure on certain AMD EPYC processors due to BWA-mem2 binary selection
- ATAC-Seq pipeline failure in edge case where joint cell calling has no cell intersection
- Pipeline report for ATAC-Seq run may report the wrong number of putative cells called
- Pipeline failure caused by JSON parse on certain OS locales
- Generate_Seurat node fails if there is only one AbSeq reference input
- Failure in ATAC-Only pipeline run with sample tag when cell calling data has not been set
- Extra Utility AnnotateCellLabelUMI fails if Run_Name parameter not provided
v2.2.1 - June 05, 2024
Added
- Support for MGI sequencer FASTQ read header and file names
- For TCR/BCR assay, new output of a compressed bundle of PNG images showing the per chain VDJ DBEC algorithm thresholds
- Long read support - added new pipeline parameter for enabling support for long R2 reads (>650 bp) - default is to auto-detect long or short reads
Updated
- Ability to customize STAR and BWA-MEM2 alignment parameters, enabled on Seven Bridges and local runs
Fixed
- Failures that could occur with MGI sequencer FASTQ files in the QualCLAlign or AlignmentAnalysis nodes
- Failure caused by VDJ AssembleAndAnnotate node - Argument list too long
- Failure caused by MergeBam node - Argument list too long
- PhiX aligned reads could incorrectly be counted as targeted mRNA reads when using a targeted panel (usually these are already removed during FASTQ generation)
v2.2 - April 19, 2024
Added
- Added support for ATAC-Seq Assay and Multiomic ATAC-Seq Assay (WTA+ATAC-Seq)
- Added ability to customize STAR alignment parameters
Updated
- Updated the immune cell type classifier to be more lenient in the percentage of bioproducts required to run
- Updated TCR BCR annotation software IGBlast to version 1.22
- Updated TCR BCR annotation to IMGT release 202349-3 (12-06-2023)
- Updated bead version detection
Fixed
- Fixed error in dimensionality reduction when zero variable genes are found due to very sparse data
v2.1 - Nov 10, 2023 (Internal and early access release only)
Added
- Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
- Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
- Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
- Added support for Enhanced Cell Capture Beads V3
Updated
- Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
- Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot
respectively
- Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
- Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
- Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR’s definition update in v1.4.1
- Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
- Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
- Updated TCR/BCR contig_id to correspond with annotated chain type
- Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
- Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
- Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
- Updated RAM requirements for VDJ_preprocess_reads on local server runs
- Updated error handling and reporting in read processing steps
- Updated logging to capture errors during alignment with STAR
- Updated FASTQ handling to skip reads with empty sequence
- Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
- Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
- Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
- Updated bead version detection and read trimming
Fixed
- Fixed issue that caused failure when a gene symbol was named 'nan'
- Fixed issue with a quote mark in a gene symbol causing a failure in the Seurat output file generation
- Fixed rare division by zero issue in DBEC algorithm
- Fixed rare issue caused by including "SampleTag" in the Run_Name parameter
Experimental
- Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9
make_rhapsody_reference tool:
- Added an 'Extra_STAR_params' input to enable passing parameters to the STAR genomeGenerate process
- Updated to automatically generate a GTF for sequences added in the 'Extra_sequences' FASTA input -- useful for transgenes
v2.0 - June 14, 2023
- Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
- New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools.
Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc) - Consolidated previously separate WTA and Targeted pipelines into one pipeline
- New updated WTA reference combines STAR index and matching GTF
- Built-in support for creating a new WTA reference with paired genome FASTA and GTF
- New Maximum_Threads parameter to limit the CPU usage on local server runs
- Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
- New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
- BAM files are not generated by default, but can be created using the Generate_Bam parameter
- Numerous other fixes and optimizations
v1.12.1 - March 14, 2023
- Fix TCR pairing percent metrics
v1.12 - Feb 21, 2023
- Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
- Added support for Flex SMK (Sample Multiplexing Kit) allowing 24 species and cell type agnostic sample tags
- Upgraded CWL to version 1.2
- VDJ nodes are only executed when necessary
- Pipeline Report: Added cell label graph when an exact count is specified
- Added option to skip creating BAM file output
- Use productive status when collapsing chains for the VDJ perCell output file
- Dominant Contigs AIRR file now have DBEC filtering applied and are uncompressed. Both AIRR files have an additional column cell_type_experimental. The non-AIRR Dominant/Unfiltered files are no longer part of the pipeline output.
- Prioritize IG/TR gene features when annotating reads from a VDJ assay
v1.11.1 - Dec 15, 2022
- Improved speed and disk usage of AnnotateReads step
- Update Pandas version to fix error: ValueError: Unstacked DataFrame is too big, causing int32 overflow
- Better prediction of RAM requirements
- Improved basic and refined putative cell calling algorithms
- Deletion of unnecessary intermediate files to save disk space
- Seven Bridges deployment: Fix for error Instance not available for automatic scheduling
v1.11 - Aug 18, 2022
BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:
- Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
- By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
- Added new "Alignment Categories" for TCR and BCR reads
- Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
- For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
- Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
- Improved flexibility for FASTQ file naming
- Updated Picard to version 2.27.4
- Updated bead version detection
v1.10.1 - April 14, 2022
- Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index.
- Fixed issue with processing FASTQ files whose filenames end in fq.gz
v1.10 - January 24, 2022
BD Rhapsody Targeted Analysis Pipeline and BD Rhapsody WTA Analysis Pipeline:
- Updated VDJ pipeline with improved performance, new assembly algorithm, new metrics and new output files containing all available contig sequences
- Added support for Rhapsody Enhanced Beads, with automatic bead version detection
- Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
- Added Alignment Categories section to metric summary which provides a breakdown of alignments for read pairs with a valid cell label and UMI
- Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
- Renamed various metrics in outputs to reflect multiomics nature of data (Target Type -> Bioproduct_Type, Gene/Target -> Bioproduct)
- Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
- Improved support for larger runs on SBG
- Updated workflow on SBG to improve editing of resource requirements
- Optimized pipeline metadata handling
- Improved checking of reference files
v1.9.1 - October 6, 2020
BD Rhapsody™ WTA Analysis Pipeline:
- Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
- Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases
v1.9 - July 29, 2020
BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:
- Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
- Optimized pipeline in various steps for memory and storage usage
- Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric
BD Rhapsody™ Targeted Analysis Pipeline:
- Support for BD Rhapsody™ VDJ CDR3 protocol
- Read and molecule counts for targets from same gene symbol are combined in the output tables
- Updated Bowtie2 alignment parameters for improved sensitivity
BD Rhapsody™ WTA Analysis Pipeline:
- Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
- Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
- Updated STAR alignment parameters for improved sensitivity
v1.8 - Oct 4, 2019
BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:
- Added Sample_Tag_ReadsPerCell.csv to Multiplex Output
- Optimized pipeline in various steps for memory usage
- Fixed bug in status determination for UMI_Adjusted_Stats.csv file
BD Rhapsody™ Targeted Analysis Pipeline:
- Updated Targets section in Metrics_Summary.csv to calculate metrics based on targets detected in putative cells only
- Removed Clustering Analysis and outputs
BD Rhapsody™ WTA Analysis Pipeline:
- Added support for BD™ AbSeq libraries
- Removed Targets section in Metrics_Summary.csv for WTA only libraries
- Removed Pct_Error_Reads and Error_Depth in UMI_Adjusted_Stats.csv, which are not applicable to WTA only libraries
v1.7.1 - August 7, 2019
- Added BD Rhapsody™ WTA Analysis Pipeline
- Fixed bug that can cause stalling when zero putative cells were identified
- Fixed bug that affected runs using Disable Refined Putative Cell Calling option
v1.6.1 - July 2, 2019
- Increased memory limits for GetDataTable and Metrics
- Fixed bug associated with "No Multiplex" option on SBG
- Uses fewer resources in AddToSam step.
v1.6 - June 10, 2019
- Added new options for putative cell determination:
- Exact Cell Count: Set a specific number of cells as putative, based on those with the highest error-corrected read count
- Disable Refined Putative Cell Calling: Determine putative cells using only the basic algorithm
- Updated to Python 3
- Updated alignment defaults (minor molecule count changes expected)
- Local install only - CWL files are bundled into one file
v1.5 - March 14, 2019
- Added support for BD Single-cell multiplexing kit: Mouse Immune
- Updated various filtering thresholds to support sequencing runs with shorter read length
- Deprecated pipeline input: BAM input
- Fixed bug in Quality Filter (minor metrics changes expected)
- Optimized pipeline (computationally faster, more scalable to support larger input data size, and better logging)
v1.3 - July 31, 2018
- Added support for BD™ AbSeq assay
- Added support for BD™ single-cell multiplexing kit - Mouse Immune
- New pipeline input - AbSeq Reference
- New pipeline outputs - Unfiltered cell-gene data tables
- Updated Metrics_Summary.csv to support metrics from multiple sequencing libraries
- Updated Recursive Substitution Error Correction (RSEC) algorithm (minor molecule count changes expected)
- Optimized pipeline to run faster
v1.02 - Nov 27, 2017
- Added support for BD Single-cell multiplexing kit - Human
- Improved pipeline speed by deleting large temp files
- Removed network requirement when running locally
- bug fix for the wrong docker image name - Dec 13, 2017