Setup

This section outlines the steps required to download and configure the CDPP (CRS4 Digital Pathology Platform) using Docker Compose. The platform is composed of multiple containerized services—including the Slides Manager, Annotations Manager, and CWL-based Workflow Engine—designed to work together within a shared environment. You will begin by cloning the deployment repository and generating the necessary configuration files, which define environment variables and service-specific parameters.

Download

You can download the Docker Compose setup from GitHub.

To begin, clone the repository to your local machine:

git clone https://github.com/crs4/cdpp-workflows.git

Configuration

The first step is to generate the default configuration files. Navigate into the cloned repository directory and run the create_env.sh script:

cd cdpp-workflows
./create_env.sh

After the script completes, the following configuration files will be created:

  • .env
  • promort_config/config.yaml

While promort_config/config.yaml is used specifically by the CDPP Annotations Manager, the main configuration for the entire platform is in the .env file. This file includes default values for all services, which you can adjust based on your setup.

The configuration for each component is described in detail below:

Slides Manager and Virtual Microscope

The configuration of the Slide Manager component (which also provides Virtual Microscope features) is performed by editing some of the variables defined in the .env file:

  • CWL_INPUTS_FOLDER: This folder contains all the whole slide images (WSI) managed by OMERO, including those in MIRAX format
  • PREDICTIONS_DIR: This folder stores the output datasets generated by the computational pipelines managed by the Workflow Engine. These outputs are automatically registered in the OMERO server database. Some outputs are post-processed to create artifacts, such as ROIs for review, while others are rendered on-the-fly as visual layers, such as cancer heatmaps, during slide viewing

Annotations Manager

To configure the Annotations Manager, edit the following variables in the .env file:

  • PROMORT_IMG: Specifies the Docker image used for the Annotations Manager. You can browse available image versions on Docker Hub.
  • PROMORT_PORT: Port used to access the Annotations Manager’s web user interface
  • PROMORT_DB_NAME: Name of the database that will store the data for the Annotations Manager
  • PROMORT_DB_USER: Username for connecting to PROMORT_DB
  • PROMORT_DB_PASSWORD: Password for the PROMORT_USER
  • PROMORT_SESSION_ID: Session ID used for the Django session cookie

The system will automatically create a user in the Annotation Manager which will be used by the Workflow Engine’s tools to interact with it when reading/writing data from/to is necessary. To setup this user edit the following variables:

  • PROMORT_USER: Username of the user that will be used by the workflow engine’s tools
  • PROMORT_PASSWORD: Password for the PROMORT_USER to access the Annotation Manager API

Workflow Engine

To configure the Workflow Engine, edit the following variables in the .env file:

  • AIRFLOW_HOME: Base directory for Airflow
  • CWL_TMP_FOLDER: Temporary directory for CWL-based workflow execution
  • CWL_INPUTS_FOLDER: Directory for CWL inputs (shared with the Slides Manager, if both systems are running on the same host)
  • CWL_OUTPUTS_FOLDER: Directory for CWL outputs
  • CWL_PICKLE_FOLDER: Directory for CWL pickled files
  • AIRFLOW_WEBSERVER_PORT: Port to access Airflow web interface (default: 8080)
  • CWL_AIRFLOW_API_PORT: Port to contact the Airflow API (default: 8081)
  • AIRFLOW_USER: Admin username for Airflow
  • AIRFLOW_PASSWORD: Admin password for Airflow
  • INPUT_DIR: Directory for workflow inputs
  • FAILED_DIR: Directory for storing data from failed workflows
  • BACKUP_DIR: Directory for storing backups of workflow-processed data
  • PREDICTIONS_DIR: Directory for model outputs (shared with the Slides Manager, if both systems are running on the same host)
  • CWLDOCKER_GPUS: GPU IDs for running inference (if available)
  • MYSQL_ROOT_PASSWORD: Root password for the workflow engine’s database
  • MYSQL_DATABASE: Name of the workflow database
  • MYSQL_USER: Workflow database user
  • MYSQL_PASSWORD: Password for the workflow database user
  • MYSQL_PORT: Database port
  • MYSQL_DATA: Volume for the database
  • OME_SEADRAGON_URL: URL of the Slides Manager web application
  • PROMORT_HOST: Hostname of the Annotations Manager
  • PROMORT_CONN_TYPE: Protocol for connecting to the Annotations Manager
  • PROMORT_PORT: Port used by the Annotations Manager
  • PROMORT_USER: Username of the user to interact with the Annotations Manager
  • PROMORT_PASSWORD: Password for the PROMORT_USER
  • PROMORT_SESSION_ID: Session ID for the Django session cookie
  • PROMORT_TOOLS_IMG: Specifies the Docker image used for the Annotations Manager auxiliary tools. You can explore available versions on Docker Hub.

Misc

  • DOCKER_NETWORK: Docker Compose network name
  • PROJECT: Docker Compose project name
  • PROXY_PORT: Proxy port