Reference Files
Introduction
For targeted mRNA assays, FASTA reference files are used to store the sequences of gene targets.
For whole transcriptome assays (WTA), the reference files archive is a compressed tarball that contains the STAR index files and the GTF transcriptome annotation corresponding to the species of cells used in the BD® WTA experiment.
For ATAC-Seq or Multiomic ATAC-Seq (WTA+ATAC-Seq) assays, the reference files archive is a compressed tarball that contains all the contents as described above for a WTA assay, an additional index for bwa-mem2, and a text file containing the mitochondrial contig names.
The AbSeq Reference is a FASTA file for BD® AbSeq Ab-Oligos used in a BD Rhapsody™ experiment.
If additional transgene sequences are used in the experiment, an additional FASTA file containing the sequences can be used as the Supplemental Reference.
Obtaining pre-designed targeted mRNA panels, WTA, or Multiomic WTA+ATAC-Seq reference files
Obtain the targeted FASTA references from the Seven Bridges demo project, or by contacting BD Biosciences customer support at scomix@bdscomix.bd.com.
For WTA assays, obtain a pre-built reference genome archive file for human or mouse from the Seven Bridges demo project, or by downloading from the following link: bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA/
For ATAC-Seq and Multiomic WTA+ATAC-Seq assays, obtain a pre-built reference genome archive file for human or mouse from the Seven Bridges demo project, or by downloading from the following link: [bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA-ATAC/](http://bd-rhapsody-public.s3-website-us-east- 1.amazonaws.com/Rhapsody-WTA-ATAC/)
Pre-built WTA reference gene biotypes
The GTF file in the pre-built WTA reference archive has been preprocessed to contain only the following gene types:
protein_coding, lncRNA, lincRNA, antisense, IG_LV_gene, IG_V_gene, IG_V_pseudogene, IG_D_gene, IG_J_gene, IG_J_ pseudogene, IG_C_gene, IG_C_pseudogene, TR_V_gene, TR_V_pseudogene, TR_D_gene, TR_J_gene, TR_J_ pseudogene, and TR_C_gene
Designing custom Targeted mRNA panels
By providing a list of genes to BD Biosciences customer support, we can design custom mRNA targeted panels. Contact BD Biosciences customer support at scomix@bdscomix.bd.com.
AbSeq reference files
If your experiment contains BD® AbSeq Ab-Oligos, you are required to have an AbSeq reference file. To prepare the AbSeq reference file, you can use the BD AbSeq Panel Generator (abseq-ref-gen.genomics.bd.com) or follow the instructions below.
-
Download the FASTA file containing all of the BD Ab-Oligo (AbO) sequence. Go to bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/AbSeq-references/BDAbSeq_allReference_latest.fasta.
-
Use a text editor such as Microsoft® Notepad or TextEdit to delete the sequence header and sequence pairs that will not be used in the experiment.
Do not use a word processor such as Microsoft® Word, which can add unintended special characters to the file.
-
Ensure that the AbSeq reference file follows these rules:
-
File extension is
.fa
or.fasta
-
Two line fasta format. Format example:
>CD103|ITGAE|AHS0001|pAbO AAATAGTATCGAGCGTAGTTAAGTTGCGTAGCCGTT >CD161:DX12|KLRB1|AHS0002|pAbO GTTATGGTTGTCGGTAGAGTATCGTGTTGCGTTAGT
Note: BD Biosciences uses this format for its sequence header:
<AntibodyName>|<GeneSymbol>|<SeqID>|pAbO
.
Building a custom WTA only or Multiomic WTA+ATAC-Seq reference archive
The WTA reference archive is a tar.gz file with the following internal structure:
BD_Rhapsody_Reference_Files/ # top level folder
star_index/ # sub-folder containing STAR index
[files created with STAR --runMode genomeGenerate]
GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"
The WTA+ATAC-Seq reference archive is a tar.gz file with the following internal structure:
BD_Rhapsody_Reference_Files/ # top level folder
star_index/ # sub-folder containing STAR index
[files created with STAR --runMode genomeGenerate]
GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"
mitochondrial_contigs.txt # mitochondrial contigs in the reference genome - one contig name per line. e.g. chrMT or
chrM, etc.
bwa-mem2_index/ # sub-folder containing bwa-mem2 index
[files created with bwa-mem2 index]
The same docker image used for running the BD Rhapsody™ Sequence Analysis Pipeline can be used for generating a WTA only or WTA+ATAC-Seq reference archive with the following steps:
-
Goto bitbucket.org/CRSwDev/cwl and download the Extra_Utilities file:
make_rhap_reference_<version>.cwl
-
Gather a matching set of genome sequence in FASTA format and GTF with gene, transcript, and exon annotations, for example, from gencodegenes.org.
-
Run
cwl-runner
like the following example :cwl-runner make_rhap_reference_2.0.cwl --Genome_fasta GRCh38.primary_assembly.genome.fa --Gtf gencode.v43.primary_assembly.annotation.gtf --Archive_prefix testrefhuman43
The resulting
testrefhuman43.tar.gz
file can be used for the Reference_Archive input of the BD Rhapsody™ Sequence Analysis Pipeline. By default the combined WTA+ATAC-Seq reference is created.To create a WTA only index please pass the flag --WTA_only, i.e. :
cwl-runner make_rhap_reference_2.0.cwl --Genome_fasta GRCh38.primary_assembly.genome.fa --Gtf gencode.v43.primary_assembly.annotation.gtf --Archive_prefix testrefhuman43 --WTA_only