Pipeline Parameters


The following table describes both the input files and optional parameters that can be set when running the Sequence Analysis Pipeline. These parameters are applicable to running the pipeline on either Seven Bridges (where they are set with the graphical user interface), or on a local server (where they are set using the input specification YML file).

Required and optional inputs and parameters

Input fieldInputRequired?
AbSeq_ReferenceFile input: FASTA AbSeq reference file as described in the Input files section. Ensure that the AbSeq reference file contains only the BD AbSeq Ab-Oligos that were used in the experiment.Optional
Cell_Calling_ATAC_AlgorithmDefault: Basic. Specify the putative cell calling algorithm for ATAC-Seq: Basic, Refined.Optional
Cell_Calling_Bioproduct_AlgorithmDefault: Basic. Specify the putative cell calling algorithm for bioproducts: Basic, RefinedOptional
Cell_Calling_DataDefault: mRNA. Specify the data to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC.Optional
Custom_STAR_ParamsDefault: pipeline defaults. Advanced. Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. This applies to fastqs provided in the Reads user inputOptional
Custom_bwa_mem2_ParamsDefault: pipeline defaults. Advanced. Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline. This applies to fastqs provided in the Reads_ATAC user inputOptional
Exact_Cell_CountSet a specific number (>=1) of cells as putative, based on those with the highest error-corrected read count.Optional
Exclude_Intronic_ReadsDefault: False. By default, reads aligned to exons and introns are considered and represented in molecule counts. Including intronic reads may increase sensitivity, resulting in an increase in molecule counts and the number of genes per cell for both cellular and nuclei samples. Intronic reads may indicate unspliced mRNAs and are also useful, for example, in the study of nuclei and RNA velocity. When set to True, intronic reads will be excluded.Optional
Expected_Cell_CountGuide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge.Optional
Generate_BamDefault: False. A Bam read alignment file contains reads from all the input libraries, but creating it can consume a lot of compute and disk resources. By setting this field to True, the Bam file will be created.Optional
Long_ReadsDefault: Auto. Specify if the STARlong aligner should be used instead of STAR. By default, when this parameter is not set, the pipeline will attempt to autodetect long reads. Set to True to force use of STARlong. Set to False to force use of STAR.Optional
Predefined_ATAC_PeaksFile input: An optional BED file (such as the ATAC-Seq peaks file output by the Rhapsody pipeline) containing pre-established chromatin accessibility peak regions for generating the ATAC-Seq cell-by-peak matrix. Useful if a direct comparison of chromatin accessibility between two or more ATAC-Seq samples is desired.Optional for ATAC-Seq assay
ReadsFile input: R1 reads and R2 reads. Ensure to include all FASTQ sequencing data from the experiment, including R1 and R2 files for the targeted or WTA RNA library, and, if applicable, the Sample Tag, TCR, BCR, and BD® AbSeq libraries.Required for applicable libraries
Reads_ATACFile input: R1, R2 and I2 reads. Ensure to include all FASTQ sequencing data from the experiment, including R1, R2 and I2 files for the ATAC-Seq library.Required for ATAC-Seq libraries
Reference_Archive (WTA or WTA+ATAC-Seq)File input: A TAR.GZ file that includes a STAR (and possibly a bwa-mem2) indexed reference genome file, along with a GTF gene annotation file.Yes
Run_NameSpecify a run name to be used as the base output filename. Use only letters, numbers, hyphens, or underscores. If any other special characters are included, they will be corrected to hyphens.Optional
Sample_Tags_VersionFor a multiplexed samples run only. Specify the Sample Tag kit used: human (hs), mouse (mm), flex, nuclei_includes_mrna, or nuclei_atac_only.Required for multiplexed samples
Supplemental_ReferenceFile input: This is a FASTA file that contains additional transgene sequences.Optional
Tag_NamesFor a multiplexed samples run only. Associate a name with each Sample Tag, which will appear in the output files. Within square brackets, enter a comma-separated list of Sample Tag numbers and associated names. For each sample, use the following format, using a hyphen—no spaces or forward slashes allowed:

Sample Tag number-sample name
Example: Tag_Names: [3-Ramos, 4-BT549]
Optional for multiplexed samples
Targeted_Reference (Targeted only)File input: FASTA file containing the sequences amplified by the primers of the Targeted assay. This can be a pre-designed, supplemental, or custom panel. Ensure that the reference matches the species and panel used for the experiment. Otherwise, read mapping will not be correctly aligned.Yes
VDJ_VersionFor experiments with VDJ libraries. Specify the species and/or chain types. Species only selection will include both BCR and TCR. Options:
human
mouse
humanBCR
humanTCR
mouseBCR
mouseTCR
Required for TCR/BCR assay