Reference Files


Introduction

For targeted mRNA assays, FASTA reference files are used to store the sequences of gene targets.

For whole transcriptome assays (WTA), the reference files archive is a compressed tarball that contains the STAR index files and the GTF transcriptome annotation corresponding to the species of cells used in the BD® WTA experiment.

For ATAC-Seq or Multiomic ATAC-Seq (WTA+ATAC-Seq) assays, the reference files archive is a compressed tarball that contains all the contents as described above for a WTA assay, an additional index for bwa-mem2, and a text file containing the mitochondrial contig names.

The AbSeq Reference is a FASTA file for BD® AbSeq Ab-Oligos used in a BD Rhapsody™ experiment.

If additional transgene sequences are used in the experiment, an additional FASTA file containing the sequences can be used as the Supplemental Reference.

Obtaining pre-designed targeted mRNA panels, WTA, or Multiomic WTA+ATAC-Seq reference files

Obtain the targeted FASTA references from the Seven Bridges demo project, or by contacting BD Biosciences customer support at scomix@bdscomix.bd.com.

For WTA assays, obtain a pre-built reference genome archive file for human or mouse from the Seven Bridges demo project, or by downloading from the following link: bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA/

For ATAC-Seq and Multiomic WTA+ATAC-Seq assays, obtain a pre-built reference genome archive file for human or mouse from the Seven Bridges demo project, or by downloading from the following link: [bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA-ATAC/](http://bd-rhapsody-public.s3-website-us-east- 1.amazonaws.com/Rhapsody-WTA-ATAC/)

Pre-built WTA reference gene biotypes

The GTF file in the pre-built WTA reference archive has been preprocessed to contain only the following gene types: protein_coding, lncRNA, lincRNA, antisense, IG_LV_gene, IG_V_gene, IG_V_pseudogene, IG_D_gene, IG_J_gene, IG_J_ pseudogene, IG_C_gene, IG_C_pseudogene, TR_V_gene, TR_V_pseudogene, TR_D_gene, TR_J_gene, TR_J_ pseudogene, and TR_C_gene

Designing custom Targeted mRNA panels

By providing a list of genes to BD Biosciences customer support, we can design custom mRNA targeted panels. Contact BD Biosciences customer support at scomix@bdscomix.bd.com.

AbSeq reference files

If your experiment contains BD® AbSeq Ab-Oligos, you are required to have an AbSeq reference file. To prepare the AbSeq reference file, you can use the BD AbSeq Panel Generator (abseq-ref-gen.genomics.bd.com) or follow the instructions below.

  1. Download the FASTA file containing all of the BD Ab-Oligo (AbO) sequence. Go to bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/AbSeq-references/BDAbSeq_allReference_latest.fasta.

  2. Use a text editor such as Microsoft® Notepad or TextEdit to delete the sequence header and sequence pairs that will not be used in the experiment.

    Do not use a word processor such as Microsoft® Word, which can add unintended special characters to the file.

  3. Ensure that the AbSeq reference file follows these rules:

  • File extension is .fa or .fasta

  • Two line fasta format. Format example:

    >CD103|ITGAE|AHS0001|pAbO
    AAATAGTATCGAGCGTAGTTAAGTTGCGTAGCCGTT
    >CD161:DX12|KLRB1|AHS0002|pAbO
    GTTATGGTTGTCGGTAGAGTATCGTGTTGCGTTAGT
    

    Note: BD Biosciences uses this format for its sequence header: <AntibodyName>|<GeneSymbol>|<SeqID>|pAbO.

Building a custom WTA only or Multiomic WTA+ATAC-Seq reference archive

The WTA reference archive is a tar.gz file with the following internal structure:

BD_Rhapsody_Reference_Files/ # top level folder

   star_index/ # sub-folder containing STAR index

      [files created with STAR --runMode genomeGenerate]

   GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"

The WTA+ATAC-Seq reference archive is a tar.gz file with the following internal structure:

BD_Rhapsody_Reference_Files/ # top level folder

   star_index/ # sub-folder containing STAR index

      [files created with STAR --runMode genomeGenerate]

   GTF for gene-transcript-annotation e.g. "gencode.v43.primary_assembly.annotation.gtf"

   mitochondrial_contigs.txt # mitochondrial contigs in the reference genome - one contig name per line. e.g. chrMT or 
chrM, etc.

   bwa-mem2_index/ # sub-folder containing bwa-mem2 index
   
      [files created with bwa-mem2 index]

The same docker image used for running the BD Rhapsody™ Sequence Analysis Pipeline can be used for generating a WTA only or WTA+ATAC-Seq reference archive with the following steps:

  1. Goto bitbucket.org/CRSwDev/cwl and download the Extra_Utilities file: make_rhap_reference_<version>.cwl

  2. Gather a matching set of genome sequence in FASTA format and GTF with gene, transcript, and exon annotations, for example, from gencodegenes.org.

  3. Run cwl-runner like the following example :

    cwl-runner make_rhap_reference_2.0.cwl --Genome_fasta GRCh38.primary_assembly.genome.fa --Gtf gencode.v43.primary_assembly.annotation.gtf --Archive_prefix testrefhuman43

    The resulting testrefhuman43.tar.gz file can be used for the Reference_Archive input of the BD Rhapsody™ Sequence Analysis Pipeline. By default the combined WTA+ATAC-Seq reference is created.

    To create a WTA only index please pass the flag --WTA_only, i.e. : cwl-runner make_rhap_reference_2.0.cwl --Genome_fasta GRCh38.primary_assembly.genome.fa --Gtf gencode.v43.primary_assembly.annotation.gtf --Archive_prefix testrefhuman43 --WTA_only