Release Notes


v2.2.1 - June 05, 2024

Added

  • Support for MGI sequencer FASTQ read header and file names
  • For TCR/BCR assay, new output of a compressed bundle of PNG images showing the per chain VDJ DBEC algorithm thresholds
  • Long read support - added new pipeline parameter for enabling support for long R2 reads (>650 bp) - default is to auto-detect long or short reads

Updated

  • Ability to customize STAR and BWA-MEM2 alignment parameters, enabled on Seven Bridges and local runs

Fixed

  • Failures that could occur with MGI sequencer FASTQ files in the QualCLAlign or AlignmentAnalysis nodes
  • Failure caused by VDJ AssembleAndAnnotate node - Argument list too long
  • Failure caused by MergeBam node - Argument list too long
  • PhiX aligned reads could incorrectly be counted as targeted mRNA reads when using a targeted panel (usually these are already removed during FASTQ generation)

v2.2 - April 19, 2024

Added

  • Added support for ATAC-Seq Assay and Multiomic ATAC-Seq Assay (WTA+ATAC-Seq)
  • Added ability to customize STAR alignment parameters

Updated

  • Updated the immune cell type classifier to be more lenient in the percentage of bioproducts required to run
  • Updated TCR BCR annotation software IGBlast to version 1.22
  • Updated TCR BCR annotation to IMGT release 202349-3 (12-06-2023)
  • Updated bead version detection

Fixed

  • Fixed error in dimensionality reduction when zero variable genes are found due to very sparse data

v2.1 - Nov 10, 2023 (Internal and early access release only)

Added

  • Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
  • Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
  • Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
  • Added support for Enhanced Cell Capture Beads V3

Updated

  • Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
  • Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot respectively

  • Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
  • Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
  • Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR’s definition update in v1.4.1
  • Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
  • Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
  • Updated TCR/BCR contig_id to correspond with annotated chain type

  • Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
  • Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
  • Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
  • Updated RAM requirements for VDJ_preprocess_reads on local server runs
  • Updated error handling and reporting in read processing steps
  • Updated logging to capture errors during alignment with STAR
  • Updated FASTQ handling to skip reads with empty sequence
  • Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
  • Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
  • Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
  • Updated bead version detection and read trimming

Fixed

  • Fixed issue that caused failure when a gene symbol was named 'nan'
  • Fixed issue with a quote mark in a gene symbol causing a failure in the Seurat output file generation
  • Fixed rare division by zero issue in DBEC algorithm
  • Fixed rare issue caused by including "SampleTag" in the Run_Name parameter

Experimental

  • Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9

make_rhapsody_reference tool:

  • Added an 'Extra_STAR_params' input to enable passing parameters to the STAR genomeGenerate process
  • Updated to automatically generate a GTF for sequences added in the 'Extra_sequences' FASTA input -- useful for transgenes

v2.0 - June 14, 2023

  • Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
  • New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools.
    Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc)
  • Consolidated previously separate WTA and Targeted pipelines into one pipeline
  • New updated WTA reference combines STAR index and matching GTF
  • Built-in support for creating a new WTA reference with paired genome FASTA and GTF
  • New Maximum_Threads parameter to limit the CPU usage on local server runs
  • Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
  • New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
  • BAM files are not generated by default, but can be created using the Generate_Bam parameter
  • Numerous other fixes and optimizations

v1.12.1 - March 14, 2023

  • Fix TCR pairing percent metrics

v1.12 - Feb 21, 2023

  • Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
  • Added support for Flex SMK (Sample Multiplexing Kit) allowing 24 species and cell type agnostic sample tags
  • Upgraded CWL to version 1.2
  • VDJ nodes are only executed when necessary
  • Pipeline Report: Added cell label graph when an exact count is specified
  • Added option to skip creating BAM file output
  • Use productive status when collapsing chains for the VDJ perCell output file
  • Dominant Contigs AIRR file now have DBEC filtering applied and are uncompressed. Both AIRR files have an additional column cell_type_experimental. The non-AIRR Dominant/Unfiltered files are no longer part of the pipeline output.
  • Prioritize IG/TR gene features when annotating reads from a VDJ assay

v1.11.1 - Dec 15, 2022

  • Improved speed and disk usage of AnnotateReads step
  • Update Pandas version to fix error: ValueError: Unstacked DataFrame is too big, causing int32 overflow
  • Better prediction of RAM requirements
  • Improved basic and refined putative cell calling algorithms
  • Deletion of unnecessary intermediate files to save disk space
  • Seven Bridges deployment: Fix for error Instance not available for automatic scheduling

v1.11 - Aug 18, 2022

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

  • Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
  • By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
  • Added new "Alignment Categories" for TCR and BCR reads
  • Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
  • For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
  • Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
  • Improved flexibility for FASTQ file naming
  • Updated Picard to version 2.27.4
  • Updated bead version detection

v1.10.1 - April 14, 2022

  • Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index.
  • Fixed issue with processing FASTQ files whose filenames end in fq.gz

v1.10 - January 24, 2022

BD Rhapsody Targeted Analysis Pipeline and BD Rhapsody WTA Analysis Pipeline:

  • Updated VDJ pipeline with improved performance, new assembly algorithm, new metrics and new output files containing all available contig sequences
  • Added support for Rhapsody Enhanced Beads, with automatic bead version detection
  • Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
  • Added Alignment Categories section to metric summary which provides a breakdown of alignments for read pairs with a valid cell label and UMI
  • Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
  • Renamed various metrics in outputs to reflect multiomics nature of data (Target Type -> Bioproduct_Type, Gene/Target -> Bioproduct)
  • Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
  • Improved support for larger runs on SBG
  • Updated workflow on SBG to improve editing of resource requirements
  • Optimized pipeline metadata handling
  • Improved checking of reference files

v1.9.1 - October 6, 2020

BD Rhapsody™ WTA Analysis Pipeline:

  • Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
  • Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases

v1.9 - July 29, 2020

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

  • Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
  • Optimized pipeline in various steps for memory and storage usage
  • Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric

BD Rhapsody™ Targeted Analysis Pipeline:

  • Support for BD Rhapsody™ VDJ CDR3 protocol
  • Read and molecule counts for targets from same gene symbol are combined in the output tables
  • Updated Bowtie2 alignment parameters for improved sensitivity

BD Rhapsody™ WTA Analysis Pipeline:

  • Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
  • Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
  • Updated STAR alignment parameters for improved sensitivity

v1.8 - Oct 4, 2019

BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:

  • Added Sample_Tag_ReadsPerCell.csv to Multiplex Output
  • Optimized pipeline in various steps for memory usage
  • Fixed bug in status determination for UMI_Adjusted_Stats.csv file

BD Rhapsody™ Targeted Analysis Pipeline:

  • Updated Targets section in Metrics_Summary.csv to calculate metrics based on targets detected in putative cells only
  • Removed Clustering Analysis and outputs

BD Rhapsody™ WTA Analysis Pipeline:

  • Added support for BD™ AbSeq libraries
  • Removed Targets section in Metrics_Summary.csv for WTA only libraries
  • Removed Pct_Error_Reads and Error_Depth in UMI_Adjusted_Stats.csv, which are not applicable to WTA only libraries

v1.7.1 - August 7, 2019

  • Added BD Rhapsody™ WTA Analysis Pipeline
  • Fixed bug that can cause stalling when zero putative cells were identified
  • Fixed bug that affected runs using Disable Refined Putative Cell Calling option

v1.6.1 - July 2, 2019

  • Increased memory limits for GetDataTable and Metrics
  • Fixed bug associated with "No Multiplex" option on SBG
  • Uses fewer resources in AddToSam step.

v1.6 - June 10, 2019

  • Added new options for putative cell determination:
    • Exact Cell Count: Set a specific number of cells as putative, based on those with the highest error-corrected read count
    • Disable Refined Putative Cell Calling: Determine putative cells using only the basic algorithm
  • Updated to Python 3
  • Updated alignment defaults (minor molecule count changes expected)
  • Local install only - CWL files are bundled into one file

v1.5 - March 14, 2019

  • Added support for BD Single-cell multiplexing kit: Mouse Immune
  • Updated various filtering thresholds to support sequencing runs with shorter read length
  • Deprecated pipeline input: BAM input
  • Fixed bug in Quality Filter (minor metrics changes expected)
  • Optimized pipeline (computationally faster, more scalable to support larger input data size, and better logging)

v1.3 - July 31, 2018

  • Added support for BD™ AbSeq assay
  • Added support for BD™ single-cell multiplexing kit - Mouse Immune
  • New pipeline input - AbSeq Reference
  • New pipeline outputs - Unfiltered cell-gene data tables
  • Updated Metrics_Summary.csv to support metrics from multiple sequencing libraries
  • Updated Recursive Substitution Error Correction (RSEC) algorithm (minor molecule count changes expected)
  • Optimized pipeline to run faster

v1.02 - Nov 27, 2017

  • Added support for BD Single-cell multiplexing kit - Human
  • Improved pipeline speed by deleting large temp files
  • Removed network requirement when running locally
  • bug fix for the wrong docker image name - Dec 13, 2017