Pipeline Parameters

The following table describes both the input files and optional parameters that can be set when running the Sequence Analysis Pipeline. These parameters are applicable to running the pipeline on either Seven Bridges (where they are set with the graphical user interface), or on a local server (where they are set using the input specification YML file).

Required and optional inputs and parameters

Input field	Input	Required?
AbSeq_Reference	File input: FASTA AbSeq reference file as described in the Input files section. Ensure that the AbSeq reference file contains only the BD AbSeq Ab-Oligos that were used in the experiment.	Optional
Cell_Calling_ATAC_Algorithm	Default: Basic. Specify the putative cell calling algorithm for ATAC-Seq: Basic, Refined.	Optional
Cell_Calling_Bioproduct_Algorithm	Default: Basic. Specify the putative cell calling algorithm for bioproducts: Basic, Refined	Optional
Cell_Calling_Data	Default: mRNA. Specify the data to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC.	Optional
Custom_STAR_Params	Default: pipeline defaults. Advanced. Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. This applies to fastqs provided in the Reads user input	Optional
Custom_bwa_mem2_Params	Default: pipeline defaults. Advanced. Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline. This applies to fastqs provided in the Reads_ATAC user input	Optional
Exact_Cell_Count	Set a specific number (>=1) of cells as putative, based on those with the highest error-corrected read count.	Optional
Exclude_Intronic_Reads	Default: False. By default, reads aligned to exons and introns are considered and represented in molecule counts. Including intronic reads may increase sensitivity, resulting in an increase in molecule counts and the number of genes per cell for both cellular and nuclei samples. Intronic reads may indicate unspliced mRNAs and are also useful, for example, in the study of nuclei and RNA velocity. When set to True, intronic reads will be excluded.	Optional
Expected_Cell_Count	Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge.	Optional
Generate_Bam	Default: False. A Bam read alignment file contains reads from all the input libraries, but creating it can consume a lot of compute and disk resources. By setting this field to True, the Bam file will be created.	Optional
Long_Reads	Default: Auto. Specify if the STARlong aligner should be used instead of STAR. By default, when this parameter is not set, the pipeline will attempt to autodetect long reads. Set to True to force use of STARlong. Set to False to force use of STAR.	Optional
Predefined_ATAC_Peaks	File input: An optional BED file (such as the ATAC-Seq peaks file output by the Rhapsody pipeline) containing pre-established chromatin accessibility peak regions for generating the ATAC-Seq cell-by-peak matrix. Useful if a direct comparison of chromatin accessibility between two or more ATAC-Seq samples is desired.	Optional for ATAC-Seq assay
Reads	File input: R1 reads and R2 reads. Ensure to include all FASTQ sequencing data from the experiment, including R1 and R2 files for the targeted or WTA RNA library, and, if applicable, the Sample Tag, TCR, BCR, and BD® AbSeq libraries.	Required for applicable libraries
Reads_ATAC	File input: R1, R2 and I2 reads. Ensure to include all FASTQ sequencing data from the experiment, including R1, R2 and I2 files for the ATAC-Seq library.	Required for ATAC-Seq libraries
Reference_Archive (WTA or WTA+ATAC-Seq)	File input: A TAR.GZ file that includes a STAR (and possibly a bwa-mem2) indexed reference genome file, along with a GTF gene annotation file.	Yes
Run_Name	Specify a run name to be used as the base output filename. Use only letters, numbers, hyphens, or underscores. If any other special characters are included, they will be corrected to hyphens.	Optional
Sample_Tags_Version	For a multiplexed samples run only. Specify the Sample Tag kit used: human (hs), mouse (mm), flex, nuclei_includes_mrna, or nuclei_atac_only.	Required for multiplexed samples
Supplemental_Reference	File input: This is a FASTA file that contains additional transgene sequences.	Optional
Tag_Names	For a multiplexed samples run only. Associate a name with each Sample Tag, which will appear in the output files. Within square brackets, enter a comma-separated list of Sample Tag numbers and associated names. For each sample, use the following format, using a hyphen—no spaces or forward slashes allowed: Sample Tag number-sample name Example: Tag_Names: [3-Ramos, 4-BT549]	Optional for multiplexed samples
Targeted_Reference (Targeted only)	File input: FASTA file containing the sequences amplified by the primers of the Targeted assay. This can be a pre-designed, supplemental, or custom panel. Ensure that the reference matches the species and panel used for the experiment. Otherwise, read mapping will not be correctly aligned.	Yes
VDJ_Version	For experiments with VDJ libraries. Specify the species and/or chain types. Species only selection will include both BCR and TCR. Options: human mouse humanBCR humanTCR mouseBCR mouseTCR	Required for TCR/BCR assay