Release Notes
v2.2.1 - June 05, 2024
Added
- Support for MGI sequencer FASTQ read header and file names
- For TCR/BCR assay, new output of a compressed bundle of PNG images showing the per chain VDJ DBEC algorithm thresholds
- Long read support - added new pipeline parameter for enabling support for long R2 reads (>650 bp) - default is to auto-detect long or short reads
Updated
- Ability to customize STAR and BWA-MEM2 alignment parameters, enabled on Seven Bridges and local runs
Fixed
- Failures that could occur with MGI sequencer FASTQ files in the QualCLAlign or AlignmentAnalysis nodes
- Failure caused by VDJ AssembleAndAnnotate node - Argument list too long
- Failure caused by MergeBam node - Argument list too long
- PhiX aligned reads could incorrectly be counted as targeted mRNA reads when using a targeted panel (usually these are already removed during FASTQ generation)
v2.2 - April 19, 2024
Added
- Added support for ATAC-Seq Assay and Multiomic ATAC-Seq Assay (WTA+ATAC-Seq)
- Added ability to customize STAR alignment parameters
Updated
- Updated the immune cell type classifier to be more lenient in the percentage of bioproducts required to run
- Updated TCR BCR annotation software IGBlast to version 1.22
- Updated TCR BCR annotation to IMGT release 202349-3 (12-06-2023)
- Updated bead version detection
Fixed
- Fixed error in dimensionality reduction when zero variable genes are found due to very sparse data
v2.1 - Nov 10, 2023 (Internal and early access release only)
Added
- Added TCR/BCR high-quality cell designation and associated metrics. This creates a new set of VDJ metrics similar to products where there is a putative cell call for VDJ libraries, separate from the cell call from associated gene expression libraries
- Added UMAP dimensionality reduction coordinates as an output file and also built those coordinates into the pipeline report, Seurat, and Scanpy outputs
- Added extra utility for only annotating the cell index and UMI of R1 and putting it in the header of R2
- Added support for Enhanced Cell Capture Beads V3
Updated
- Updated Seurat output to separate mRNA and AbSeq data into the RNA and ADT assays respectively
- Updated Scanpy output to use Muon (.h5mu) and create mRNA and AbSeq data in separate anndata objects, rna and prot
respectively
- Updated TCR/BCR dominant contigs file to include AIRR compliant germline columns
- Updated TCR/BCR dominant contigs file to only retain cell type appropriate chains. All chains are still available in the unfiltered contigs file.
- Updated TCR/BCR dominant contigs file to rename the column 'duplicate_count' to 'umi_count', in accordance with AIRR’s definition update in v1.4.1
- Updated TCR/BCR dominant contig selection process, elevating the importance of a productive contig with high relative read count, and removing the CDR3 requirement
- Updated TCR/BCR DBEC algorithm to allow exceptions for CDR3 sequences not seen in any other cell, and CDR3 paired chains seen in other cells
- Updated TCR/BCR contig_id to correspond with annotated chain type
- Updated basic cell calling to scale better with small and large cell datasets, and prevent most inappropriately high cell calls derived from noise signatures
- Updated Alignment Category 'No_Feature_Pct' metric to include targeted mRNA reads that are filtered out due to an invalid alignment
- Updated cell label annotation to improve the speed of annotation for reads with cell label sequences that contain more than 1 error
- Updated RAM requirements for VDJ_preprocess_reads on local server runs
- Updated error handling and reporting in read processing steps
- Updated logging to capture errors during alignment with STAR
- Updated FASTQ handling to skip reads with empty sequence
- Updated cell type classification model selection to better select an appropriate model when not all bioproducts are found in any one model
- Updated pipeline report to show sub-sampled tSNE and UMAP plots, in the case where the putative cell count exceeds 100,000
- Updated pipeline report to show details of refined cell calling, when refined cell calling is selected
- Updated bead version detection and read trimming
Fixed
- Fixed issue that caused failure when a gene symbol was named 'nan'
- Fixed issue with a quote mark in a gene symbol causing a failure in the Seurat output file generation
- Fixed rare division by zero issue in DBEC algorithm
- Fixed rare issue caused by including "SampleTag" in the Run_Name parameter
Experimental
- Added docker-free version of the pipeline, available for local server installs as a tar.gz bundle. Tested on Linux versions: Ubuntu 16 / 20 / 22 - Red Hat 7 - CentOS 7 / 9
make_rhapsody_reference tool:
- Added an 'Extra_STAR_params' input to enable passing parameters to the STAR genomeGenerate process
- Updated to automatically generate a GTF for sequences added in the 'Extra_sequences' FASTA input -- useful for transgenes
v2.0 - June 14, 2023
- Major rewrite to read processing steps of the pipeline results in up to 7x faster performance and 2x less disk space required
- New cell-bioproduct datatable output file formats: MEX, for broad compatibility with downstream analysis tools.
Seurat RDS and ScanPy H5AD for single files that include all cell metadata (i.e. Sample Tag, TCR/BCR, etc) - Consolidated previously separate WTA and Targeted pipelines into one pipeline
- New updated WTA reference combines STAR index and matching GTF
- Built-in support for creating a new WTA reference with paired genome FASTA and GTF
- New Maximum_Threads parameter to limit the CPU usage on local server runs
- Basic cell caller is now the default algorithm. Refined cell calling algorithm can still be used by setting the Enable_Refined_Cell_Call parameter
- New pipeline input: Expected_Cell_Count - Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded in the Rhapsody cartridge
- BAM files are not generated by default, but can be created using the Generate_Bam parameter
- Numerous other fixes and optimizations
v1.12.1 - March 14, 2023
- Fix TCR pairing percent metrics
v1.12 - Feb 21, 2023
- Added support for Rhapsody Enhanced bead V2 with an expanded cell label diversity
- Added support for Flex SMK (Sample Multiplexing Kit) allowing 24 species and cell type agnostic sample tags
- Upgraded CWL to version 1.2
- VDJ nodes are only executed when necessary
- Pipeline Report: Added cell label graph when an exact count is specified
- Added option to skip creating BAM file output
- Use productive status when collapsing chains for the VDJ perCell output file
- Dominant Contigs AIRR file now have DBEC filtering applied and are uncompressed. Both AIRR files have an additional column cell_type_experimental. The non-AIRR Dominant/Unfiltered files are no longer part of the pipeline output.
- Prioritize IG/TR gene features when annotating reads from a VDJ assay
v1.11.1 - Dec 15, 2022
- Improved speed and disk usage of AnnotateReads step
- Update Pandas version to fix error: ValueError: Unstacked DataFrame is too big, causing int32 overflow
- Better prediction of RAM requirements
- Improved basic and refined putative cell calling algorithms
- Deletion of unnecessary intermediate files to save disk space
- Seven Bridges deployment: Fix for error Instance not available for automatic scheduling
v1.11 - Aug 18, 2022
BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:
- Added a pipeline report HTML that contains information about the analysis including the metrics summary and graphs to visualize the results
- By default, reads aligned to exons and introns are now considered and represented in molecule counts. Added parameter to control this behavior.
- Added new "Alignment Categories" for TCR and BCR reads
- Added support for VDJ Adaptive Immune Receptor Repertoire (AIRR) standard format
- For pipeline run where putative cells are determined based on AbSeq (protein) counts, added file output of cell IDs corresponding to suspected protein aggregates
- Updated CWL workflow on Seven Bridges to fix memory failures and dynamically allocate resources for large datasets
- Improved flexibility for FASTQ file naming
- Updated Picard to version 2.27.4
- Updated bead version detection
v1.10.1 - April 14, 2022
- Fixed issue with cell label detection on reads from TCR/BCR, when TCR/BCR libraries were combined with other library types (WTA, Targeted, AbSeq) in a single sequencing index.
- Fixed issue with processing FASTQ files whose filenames end in fq.gz
v1.10 - January 24, 2022
BD Rhapsody Targeted Analysis Pipeline and BD Rhapsody WTA Analysis Pipeline:
- Updated VDJ pipeline with improved performance, new assembly algorithm, new metrics and new output files containing all available contig sequences
- Added support for Rhapsody Enhanced Beads, with automatic bead version detection
- Added option to call putative cells based on AbSeq read counts (for troubleshooting only)
- Added Alignment Categories section to metric summary which provides a breakdown of alignments for read pairs with a valid cell label and UMI
- Added separate metric summary files for each sample tag for experiments using BD Single-Cell Multiplexing kits
- Renamed various metrics in outputs to reflect multiomics nature of data (Target Type -> Bioproduct_Type, Gene/Target -> Bioproduct)
- Added Pct_Read_Pair_Overlap and Median Reads Per cell metric to metric summary
- Improved support for larger runs on SBG
- Updated workflow on SBG to improve editing of resource requirements
- Optimized pipeline metadata handling
- Improved checking of reference files
v1.9.1 - October 6, 2020
BD Rhapsody™ WTA Analysis Pipeline:
- Improved putative cell calling algorithm to reduce overcalling of putative cells in high cell input experiments
- Updated alignment settings to improve AbSeq mapping when R2 read length is greater than 75 bases
v1.9 - July 29, 2020
BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:
- Improved FASTQ file pairing - filenames are flexible and pairing is now based on read sequence identifier
- Optimized pipeline in various steps for memory and storage usage
- Fixed bugs related to Sample Multiplexing Kit noise and DBEC mean molecule metric
BD Rhapsody™ Targeted Analysis Pipeline:
- Support for BD Rhapsody™ VDJ CDR3 protocol
- Read and molecule counts for targets from same gene symbol are combined in the output tables
- Updated Bowtie2 alignment parameters for improved sensitivity
BD Rhapsody™ WTA Analysis Pipeline:
- Updated Pct_Cellular Metrics calculations to match Bioinformatics handbook descriptions
- Added support for supplemental reference fasta files, which allow alignment to transgenes, like viral RNA or GFP
- Updated STAR alignment parameters for improved sensitivity
v1.8 - Oct 4, 2019
BD Rhapsody™ Targeted Analysis Pipeline and BD Rhapsody™ WTA Analysis Pipeline:
- Added Sample_Tag_ReadsPerCell.csv to Multiplex Output
- Optimized pipeline in various steps for memory usage
- Fixed bug in status determination for UMI_Adjusted_Stats.csv file
BD Rhapsody™ Targeted Analysis Pipeline:
- Updated Targets section in Metrics_Summary.csv to calculate metrics based on targets detected in putative cells only
- Removed Clustering Analysis and outputs
BD Rhapsody™ WTA Analysis Pipeline:
- Added support for BD™ AbSeq libraries
- Removed Targets section in Metrics_Summary.csv for WTA only libraries
- Removed Pct_Error_Reads and Error_Depth in UMI_Adjusted_Stats.csv, which are not applicable to WTA only libraries
v1.7.1 - August 7, 2019
- Added BD Rhapsody™ WTA Analysis Pipeline
- Fixed bug that can cause stalling when zero putative cells were identified
- Fixed bug that affected runs using Disable Refined Putative Cell Calling option
v1.6.1 - July 2, 2019
- Increased memory limits for GetDataTable and Metrics
- Fixed bug associated with "No Multiplex" option on SBG
- Uses fewer resources in AddToSam step.
v1.6 - June 10, 2019
- Added new options for putative cell determination:
- Exact Cell Count: Set a specific number of cells as putative, based on those with the highest error-corrected read count
- Disable Refined Putative Cell Calling: Determine putative cells using only the basic algorithm
- Updated to Python 3
- Updated alignment defaults (minor molecule count changes expected)
- Local install only - CWL files are bundled into one file
v1.5 - March 14, 2019
- Added support for BD Single-cell multiplexing kit: Mouse Immune
- Updated various filtering thresholds to support sequencing runs with shorter read length
- Deprecated pipeline input: BAM input
- Fixed bug in Quality Filter (minor metrics changes expected)
- Optimized pipeline (computationally faster, more scalable to support larger input data size, and better logging)
v1.3 - July 31, 2018
- Added support for BD™ AbSeq assay
- Added support for BD™ single-cell multiplexing kit - Mouse Immune
- New pipeline input - AbSeq Reference
- New pipeline outputs - Unfiltered cell-gene data tables
- Updated Metrics_Summary.csv to support metrics from multiple sequencing libraries
- Updated Recursive Substitution Error Correction (RSEC) algorithm (minor molecule count changes expected)
- Optimized pipeline to run faster
v1.02 - Nov 27, 2017
- Added support for BD Single-cell multiplexing kit - Human
- Improved pipeline speed by deleting large temp files
- Removed network requirement when running locally
- bug fix for the wrong docker image name - Dec 13, 2017