Cluster & Docker deployment
Anduril can be flexbily deployed on a local machine (laptop / desktop) or in a cluster. The default settings are suitable for local deployment: parallel component execution is capped at four threads, and component invocation is done locally. Cluster deployment is needed when analyzing large data sets. Docker is useful to encapsulate external dependencies of components.
The main configuration options for deployment are the following flags to anduril run
:
--threads N
: Maximum number of parallel component invocations (local threads or jobs in a cluster).--wrapper SCRIPT
: A program or script that is called for each component invocation, and blocks until the component is finished.- Scala code that constructs workflows can pass custom key-value pairs to the wrapper script (e.g., preferred execution host, or resource usage). The Scala code
myComponent._custom("key") = "value"
exposes the environment variable $ANDURIL_CUSTOM_KEY in the wrapper.
For convenience, the deployment flags can be inserted into the workflow Scala code as special comments in the header of the file:
#!/usr/bin/env anduril
//$OPT --wrapper my-wrapper.sh
//$OPT --threads 16
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object DeploymentDefaults {
// Code goes here
}
Deployment configurations
Below are ready-made configurations for common deployments. Ensure that the wrapper scripts are on PATH by including $ANDURIL_HOME/bin on PATH. You can customize the scripts to your environment.
Slurm
Command line arguments: --wrapper anduril-wrapper-slurm --threads 99
The Slurm wrapper uses srun. By default it submits all components to Slurm. It can be customized by bypassing Slurm (executing locally), and managing CPU and memory resources. For more fine-grained control, you can add more custom attributes to anduril-wrapper-slurm
.
Customizing:
val component = MyComponent()
component._custom("cpu") = "4" // --cpus-per-task
component._custom("memory") = "2048" // --mem=MB
val local = MyComponent()
component._custom("host") = "local" // Bypasses Slurm and executes locally
SGE
Command line arguments (SGE): --wrapper "qrsh -now no" --threads 99
Docker
Command line arguments: --wrapper anduril-wrapper-docker
This wrapper runs components inside a selected Docker container. The image
is specified with a _custom("docker")
annotation; components without that
annotation are executed on the host without Docker.
The container should have an Anduril installation available and have ANDURIL_HOME
set.
The anduril-wrapper-docker
wrapper sets the USER_ID
environment variable to current user ID
to handle file permissions correctly. This requires that the image
has a entry point that switches to a local user having USER_ID
.
The anduril/core image (and images derived from it) support this feature.
Scala workflow:
val component = MyComponent()
component._custom("docker") = "repository/dockerimage"
To use your own or third party Docker images that do not have Anduril installed,
you can map the $ANDURIL_HOME
folder from the host to the container and set
the ANDURIL_HOME
environment variable in a custom wrapper script.
Cluster topology
There are two types of executables in Anduril: The core workflow engine, which is invoked using anduril
, and components, which can be implemented using a variety of languages (R, Python, Java, etc.). The core is only needed on the node where anduril
is interactively invoked (e.g., a head node). Wrapper scripts then forward execution to worker nodes.
The interactive head node requires the following software and files:
- Anduril installation ($ANDURIL_HOME)
anduril
executable- Component bundle code ($ANDURIL_HOME/bundles or custom location set with $ANDURIL_BUNDLES)
- Execution folder
Worker nodes require the following software:
- Anduril installation ($ANDURIL_HOME)
- $ANDURIL_HOME must be set
- Component bundle code ($ANDURIL_HOME/bundles or custom location set with $ANDURIL_BUNDLES)
- Execution folder
- External dependencies of the components (e.g., R, Bioconductor, command line tools)
Since the Anduril installation, component bundles and execution folder are required both on the interactive and worker nodes, it is convenient to share them using a shared or distributed file system (e.g., NFS). Mount points should be the same on all hosts, so that file names work portably.
The interactive anduril
process needs to remain in memory during the workflow execution, so it should be executed from screen
or tmux
for long jobs.
Writing custom wrappers
If you want to integrate Anduril with your specific cluster environment, and the scripts above are not suitable, you can write your own wrapper script. The interface of a wrapper is:
- Wrapper is an executable program or shell script that is on PATH or is specified using full path.
- Receives the component executable and arguments as parameters. Example:
python /opt/bundle/Component/implementation.py ~/results/component/_command
. - Executes the component from the context of the current working directory (CWD). A remote executor should change to the directory that corresponds to the host CWD.
- Blocks until the component is finished.
- Forwards the exit code from the component: zero on success or non-zero on failure.
- Forwards standard output and error of the component to corresponding streams of the wrapper.
The following environment variables are available in the wrapper:
$ANDURIL_HOME
: Anduril installation location on the host.$ANDURIL_COMPONENT_BUNDLE
: Name of the bundle for the current component. Example: tools.$ANDURIL_COMPONENT_NAME
: Name of the current component. Example: BashEvaluate.$ANDURIL_COMPONENT_DIRECTORY
: Location of the component implementation on the host.$ANDURIL_EXECUTION_DIRECTORY
: Current execution folder on the host.$ANDURIL_COMPONENT_EXECUTION_DIRECTORY
: Execution folder of the component. This is a sub-folder of$ANDURIL_EXECUTION_DIRECTORY
.$ANDURIL_CUSTOM_*
: Key-value pairs set by the user in the Scala code.
Since wrappers blocks until the component is executed, wrappers can control the concurrency of the workflow in conjunction with --threads N
. For example, if the cluster environment controls job queue length, --threads
can be set to a large value (such as 99) and let the cluster limit concurrency.
Synchronizing file systems
Distributed and parallel file systems may implement caching that does not ensure a synchronized view of the file system on the interactive and worker nodes. In this case, you may need to implement explicit I/O synchronization commands in the wrapper script, or modify file system caching parameters.
A symptom of an unsynchronized file system is that anduril
running on the interactive node gives an error about missing output files of a component, even though the component execution finishes successfully on the worker node. When you manually view the contents of the execution folder using ls
, the files apparently are there.
This issue can be fixed using the following methods:
- In a custom wrapper script, execute a refresh on the interactive node after the component finishes execution using
ls -R $ANDURIL_COMPONENT_EXECUTION_DIRECTORY > /dev/null
. - In a custom wrapper script, execute an explicit synchronization command (
sync
) on the worker nodes after the component finishes execution. - If the above do not resolve synchronization issues, you may need to reduce I/O caching or implement delay polling on the host. Note that these generally reduce performance.