BAM and BAM Index
Files:
[sample_name]_Bioproduct.bam
[sample_name]_Bioproduct.bam.(bai|csi)
[sample_name]_ATAC.bam
[sample_name]_ATAC.bam.(bai|csi)
Note:
-
The
*_Bioproduct.bam
consists of reads arising from AbSeq, SampleTag, Targeted, VDJ or WTA assays. -
A
Combined_
prefix is added to the bam and bai/csi files when SampleTags are present in the experiment to signify that the resulting bam contains reads from all the samples.
BAM is an alignment file in binary format that is generated by the aligner and contains tags related to alignment quality. The Bioproduct BAM consists of alignments from the R2 reads only, whereas the ATAC BAM contains alignments from both the R1 and R2 reads. The BAM files are sorted according to the alignment coordinates of either the R2 read (Bioproduct) or both the R1 and R2 reads (ATAC) on each chromosome. The BAM Index is the index file associated with the coordinate-sorted BAM file. A BAI index is typically generated; however, if any contig in the reference genome exceeds 500 Mb in length, a CSI index is created instead, as it supports larger contig sizes beyond the BAI index specifications.
The BD Rhapsody™ Sequence Analysis Pipeline further annotates the BAM files with the tags described below. For the Bioproduct BAM, if a read has multiple alignments (NH tag > 1), then only the first alignment (HI tag is 0 or 1) will be annotated with tags 'MA' and 'CN'. BAM alignments tags may include the following:
Tag | Definition |
---|---|
NH | Number of reported alignments that contain the query in the current record. |
HI | Query hit index. |
AS | Alignment score assigned by the aligner. |
NM | Edit distance to the reference. |
CB | A number between 1 and 3843 representing a unique cell label sequence (CB = 0 when no cell label sequence is detected). |
MR | Raw molecular identifier sequence. (Bioproduct BAM only) |
MA | RSEC-adjusted molecular identifier sequence. If not a true cell, the raw UMI is repeated in this tag. (Bioproduct BAM only) |
CN | Indicates if a sequence is derived from a putative cell, as determined by the cell label filtering algorithm (T: putative cell; x: invalid cell label or noise cell). Note: You can distinguish between an invalid cell label and a noise cell with the CB tag (invalid cell labels are 0). |
ST | The value is 1–24, indicating the Sample Tag of the called putative cell, or M for multiplet, or x for undetermined. |
XF | Name of the Gene/AbSeq/SampleTag that a particular read was annotated to. (Bioproduct BAM only) |
MQ | Mapping quality of the mate/next segment |
MC | CIGAR string for mate/next segment |
MD | Mismatching positions/bases |
XS | Suboptimal alignment score |
ms | The ms tag is produced samtools fixmate, and can be used by samtools markdup and other tools to identify duplicate reads. |
Note: A BAM file can be converted to a tab-delimited text file (SAM format) by using Samtools (see htslib.org)