System and Software Installation


Introduction

The software applications required for analysis have specific software tools. To ensure that these tools are always available, the analysis is run in a self-contained environment called a docker container. The docker container is obtained by “pulling” or downloading a docker image to your local computer. The docker container has all of the libraries and settings required by the pipeline to run the analysis. In the portable docker container, the analysis can be run reproducibly wherever it is deployed, whether on a local installation or the Seven Bridges Genomics platform. CWL-runner is the tool that manages docker containers to complete the pipeline run. CWL-runner uses two inputs: a CWL workflow file and a YML input specification file. The CWL workflow file describes each step in the pipeline and how each docker container should run to complete the step. The YML file tells CWL-runner where to find the pipeline inputs, such as the sequencer read files (fastqs) and reference. When the pipeline run is finished, CWL-runner obtains the final outputs in the docker containers and adds them to a designated output folder on your computer.

Minimum system requirements

  • Operating system: macOS® or Linux®. (Microsoft® Windows® is not supported)
  • 8-core processor (>16-core recommended)
  • RAM
    • Targeted assays: 32 GB RAM (>128 GB recommended)
    • WTA and ATAC-Seq assays: 96 GB (>192 GB recommended)
  • 250 GB free disk space (>1 TB recommended)

Software requirements

Docker

Install the Docker Engine. docs.docker.com/engine/install/

Ensure that docker is running by entering docker at the command line. The docker manual should print to the terminal screen.

Python 3

  1. Check to see if a version of Python 3 is already installed by running at the command line:

    $ python3 --version

  2. Ensure that you are using a local installation of Python and not a system version. Run:

    $ which python

This should return the path to a local installation and not to a system path (usually /usr/bin/python).

Using a system installation of python might not give you sufficient permissions to install the required packages.

  1. If a version of Python 3 is not installed, download and install it from python.org/downloads.

  2. Update pip before installing cwlref-runner by using the command:

    $ pip install -U pip

CWL-runner

  1. Install the package from PyPi. Enter:

    $ pip install cwlref-runner

  2. Ensure that cwl-runner is in your path. Type:

    $ cwl-runner

  3. If the command is not found, add the install location of the pip packages to $PATH.

    a. Find where cwl ref-runner is installed by entering:

    $ pip show cwlref-runner

    b. Add the above path to $PATH. For example:

    $ export PATH=$PATH:/Library/Frameworks/Python.framework/Versions/3.6/lib/python3

    c. Restart the command line utility.

CWL and YML files

Ensure that you are using the correct CWL files with your pipeline, or the analysis might fail.

  1. Goto bitbucket.org/CRSwDev/cwl.
  2. In the left pane, click Downloads > Download Repository. The CWL and example YML files are downloaded.
  3. Unzip the archive. Each folder within the archive is named after the pipeline version it corresponds to.

Pipeline image

  1. Ensure that docker is running.

  2. Download (pull) the docker image by entering:

    $ docker pull bdgenomics/rhapsody

    Note: The pull command automatically downloads the most current pipeline version. To download an earlier version, specify the version number. For example:

    $ docker pull bdgenomics/rhapsody:v1.0

  3. Confirm the pipeline image by entering:

    $ docker images

Note:

  • bdgenomics/rhapsody appears under the repository column.
  • The pipeline version number appears under the tag column.