BAM and BAM Index


Files:

[sample_name]_Bioproduct.bam
[sample_name]_Bioproduct.bam.(bai|csi)
[sample_name]_ATAC.bam
[sample_name]_ATAC.bam.(bai|csi)

Note:

  1. The *_Bioproduct.bam consists of reads arising from AbSeq, SampleTag, Targeted, VDJ or WTA assays.

  2. A Combined_ prefix is added to the bam and bai/csi files when SampleTags are present in the experiment to signify that the resulting bam contains reads from all the samples.

BAM is an alignment file in binary format that is generated by the aligner and contains tags related to alignment quality. The Bioproduct BAM consists of alignments from the R2 reads only, whereas the ATAC BAM contains alignments from both the R1 and R2 reads. The BAM files are sorted according to the alignment coordinates of either the R2 read (Bioproduct) or both the R1 and R2 reads (ATAC) on each chromosome. The BAM Index is the index file associated with the coordinate-sorted BAM file. A BAI index is typically generated; however, if any contig in the reference genome exceeds 500 Mb in length, a CSI index is created instead, as it supports larger contig sizes beyond the BAI index specifications.

The BD Rhapsody™ Sequence Analysis Pipeline further annotates the BAM files with the tags described below. For the Bioproduct BAM, if a read has multiple alignments (NH tag > 1), then only the first alignment (HI tag is 0 or 1) will be annotated with tags 'MA' and 'CN'. BAM alignments tags may include the following:

TagDefinition
NHNumber of reported alignments that contain the query in the current record.
HIQuery hit index.
ASAlignment score assigned by the aligner.
NMEdit distance to the reference.
CBA number between 1 and 3843 representing a unique cell label sequence (CB = 0 when no cell label sequence is detected).
MRRaw molecular identifier sequence. (Bioproduct BAM only)
MARSEC-adjusted molecular identifier sequence. If not a true cell, the raw UMI is repeated in this tag. (Bioproduct BAM only)
CN

Indicates if a sequence is derived from a putative cell, as determined by the cell label filtering algorithm (T: putative cell; x: invalid cell label or noise cell).

Note: You can distinguish between an invalid cell label and a noise cell with the CB tag (invalid cell labels are 0).

STThe value is 1–24, indicating the Sample Tag of the called putative cell, or M for multiplet, or x for undetermined.
XFName of the Gene/AbSeq/SampleTag that a particular read was annotated to. (Bioproduct BAM only)
MQMapping quality of the mate/next segment
MCCIGAR string for mate/next segment
MDMismatching positions/bases
XSSuboptimal alignment score
msThe ms tag is produced samtools fixmate, and can be used by samtools markdup and other tools to identify duplicate reads.

Note: A BAM file can be converted to a tab-delimited text file (SAM format) by using Samtools (see htslib.org)