Reference Files

Introduction

For targeted mRNA assays, FASTA reference files are used to store the sequences of gene targets.

For whole transcriptome assays (WTA), the reference files archive is a compressed tarball that contains the STAR index files and the GTF transcriptome annotation corresponding to the species of cells used in the BD® WTA experiment.

For ATAC-Seq or Multiomic ATAC-Seq (WTA+ATAC-Seq) assays, the reference files archive is a compressed tarball that contains all the contents as described above for a WTA assay, an additional index for bwa-mem2, and a text file containing the mitochondrial contig names.

The AbSeq Reference is a FASTA file for BD® AbSeq Ab-Oligos used in a BD Rhapsody™ experiment.

If additional transgene sequences are used in the experiment, an additional FASTA file containing the sequences can be used as the Supplemental Reference.

Obtaining pre-designed targeted mRNA panels, WTA, or Multiomic WTA+ATAC-Seq reference files

Obtain the targeted FASTA references from the Seven Bridges demo project, or by contacting BD Biosciences customer support at scomix@bdscomix.bd.com.

For WTA assays, obtain a pre-built reference genome archive file for human or mouse from the Seven Bridges demo project, or by downloading from the following link: bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA/

For ATAC-Seq and Multiomic WTA+ATAC-Seq assays, obtain a pre-built reference genome archive file for human or mouse from the Seven Bridges demo project, or by downloading from the following link: [bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA-ATAC/](http://bd-rhapsody-public.s3-website-us-east- 1.amazonaws.com/Rhapsody-WTA-ATAC/)

Pre-built WTA reference gene biotypes

The GTF file in the pre-built WTA reference archive has been preprocessed to contain only the following gene types: protein_coding, lncRNA, lincRNA, antisense, IG_LV_gene, IG_V_gene, IG_V_pseudogene, IG_D_gene, IG_J_gene, IG_J_ pseudogene, IG_C_gene, IG_C_pseudogene, TR_V_gene, TR_V_pseudogene, TR_D_gene, TR_J_gene, TR_J_ pseudogene, and TR_C_gene

Designing custom Targeted mRNA panels

By providing a list of genes to BD Biosciences customer support, we can design custom mRNA targeted panels. Contact BD Biosciences customer support at scomix@bdscomix.bd.com.

AbSeq reference files

If your experiment contains BD® AbSeq Ab-Oligos, you are required to have an AbSeq reference file. To prepare the AbSeq reference file, you can use the BD AbSeq Panel Generator (abseq-ref-gen.genomics.bd.com) or follow the instructions below.

Download the FASTA file containing all of the BD Ab-Oligo (AbO) sequence. Go to bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/AbSeq-references/BDAbSeq_allReference_latest.fasta.
Use a text editor such as Microsoft® Notepad or TextEdit to delete the sequence header and sequence pairs that will not be used in the experiment.

Do not use a word processor such as Microsoft® Word, which can add unintended special characters to the file.
Ensure that the AbSeq reference file follows these rules:

File extension is .fa or .fasta
Two line fasta format. Format example:
```
>CD103|ITGAE|AHS0001|pAbO
AAATAGTATCGAGCGTAGTTAAGTTGCGTAGCCGTT
>CD161:DX12|KLRB1|AHS0002|pAbO
GTTATGGTTGTCGGTAGAGTATCGTGTTGCGTTAGT
```
Note: BD Biosciences uses this format for its sequence header: <AntibodyName>|<GeneSymbol>|<SeqID>|pAbO.

Building a custom WTA only or Multiomic WTA+ATAC-Seq reference archive

The WTA reference archive is a tar.gz file with the following internal structure:

BD_Rhapsody_Reference_Files/ # top level folder

   star_index/ # sub-folder containing STAR index

      [files created with STAR --runMode genomeGenerate]

   GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"

The WTA+ATAC-Seq reference archive is a tar.gz file with the following internal structure:

BD_Rhapsody_Reference_Files/ # top level folder

   star_index/ # sub-folder containing STAR index

      [files created with STAR --runMode genomeGenerate]

   GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"

   mitochondrial_contigs.txt # mitochondrial contigs in the reference genome - one contig name per line. e.g. chrMT or 
chrM, etc.

   bwa-mem2_index/ # sub-folder containing bwa-mem2 index
   
      [files created with bwa-mem2 index]

The same docker image used for running the BD Rhapsody™ Sequence Analysis Pipeline can be used for generating a WTA only or WTA+ATAC-Seq reference archive with the following steps:

Go to bitbucket.org/CRSwDev/cwl and download the Extra_Utilities file: make_rhap_reference_<version>.cwl
Gather a matching set of genome sequence in FASTA format and GTF with gene, transcript, and exon annotations: for example, from gencodegenes.org. Chromosome names need to match exactly between the FASTA file and the GTF. These features of the GTF are important:
- Each gene and exon line in the GTF must have a "gene_name" or "gene_id" attribute. ("gene_name" will be used if both exist)
- The value of the "strand" column of these lines should be "+" or "-", and must not be '.'
- These lines should include a "gene_type" or a "gene_biotype" attribute, or they will be filtered out by the pipeline. Some GTF files do not include these attributes on non-gene features, so be sure to check the input.
- By default, the make_rhap_reference tool will remove any gene lines in which the "gene_type" or "gene_biotype" attribute is not in the following list: "protein_coding", "lncRNA", "lincRNA", "antisense", "IG_LV_gene", "IG_V_gene", "IG_V_pseudogene", "IG_D_gene", "IG_J_gene", "IG_J_pseudogene", "IG_C_gene", "IG_C_pseudogene", "TR_V_gene", "TR_V_pseudogene", "TR_D_gene", "TR_J_gene", "TR_J_pseudogene", or "TR_C_gene". This filtering can be turned off with the Filtering_off parameter.
Run cwl-runner like the following example :

cwl-runner make_rhap_reference_2.0.cwl --Genome_fasta GRCh38.primary_assembly.genome.fa --Gtf gencode.v43.primary_assembly.annotation.gtf --Archive_prefix testrefhuman43

The resulting testrefhuman43.tar.gz file can be used for the Reference_Archive input of the BD Rhapsody™ Sequence Analysis Pipeline. By default the combined WTA+ATAC-Seq reference is created.

To create a WTA only index please pass the flag --WTA_only, i.e. : cwl-runner make_rhap_reference_2.0.cwl --Genome_fasta GRCh38.primary_assembly.genome.fa --Gtf gencode.v43.primary_assembly.annotation.gtf --Archive_prefix testrefhuman43 --WTA_only