Align and Annotate R2 Reads
Alignment to Reference
Read pairs that pass quality filters and have a valid cell label and UMI on R1 have their R2 aligned to a reference using STAR (Spliced Transcripts Alignment to a Reference © Alexander Dobin, 2009-2022). For Targeted assays, a STAR reference is generated dynamically based on the input Targeted_Reference FASTA file, and any other provided FASTA files. For WTA assays, the STAR reference is prebuilt and provided in the Reference_Archive input, and any additional FASTA files are included as genome FastaFiles for alignment.
Criteria for a valid R2 read
Targeted assays:
For targeted assays, an R2 read is a valid alignment if all of these criteria are met:
- The R2 alignment begins within the first five nucleotides for mRNA, first 15 nucleotides for AbSeq, and first 25 nucleotides for Sample Tags. This criterion ensures that the R2 read originates from an actual PCR priming event.
- The length of the alignment match (can be a match or mismatch) in the CIGAR string is >=37 for mRNA >=25 for AbSeq and >=40 for Sample Tags. A CIGAR (Compact Idiosyncratic Gapped Alignment Report) string is a sequence of base lengths to indicate base alignments, insertions, and deletions with respect to the reference sequence.
- The read does not align to phiX174.
WTA assays:
By default, alignments to both exons and introns are used. Including reads that align to introns may increase sensitivity, resulting in an increase in molecule counts and the number of genes per cell for both cellular and nuclei samples. Reads that align to introns may indicate the presence of unspliced mRNAs and are also useful in the study of nuclei and RNA velocity.
An R2 is a valid gene alignment if all of these criteria are met:
- The sum of the CIGAR alignment matches must be >=25.
- The read aligns uniquely to an exon or intron of a bioproduct in the reference.
- The read does not align to phiX174.
- If "Exclude Intronic Reads" option is selected, read must align to exon.