8. Annotations

The facilities introduced so far are sufficient to construct workflows for most needs. In some cases, however, you might need to customize the workflow logic beyond controlling the topology. Annotations are a way to modify selected parts of the workflow. Annotations are properties or methods of components placed on the workflow. As a convention, they start with the underscore _ to separate them from regular output values.

Summary of all defined annotations:

Name Values Description
_bind(from) component or port Non-data dependencies
_custom(key) = value string Pass values to environment
_enabled true, false Enable/disable component
_execute “changed”, “always”, “once” Volatile or fixed components
_filename(port, filename) string Rename output
_name string Set explicit name
_priority integer Workflow priority
_keep true, false keep output files of a component

Example of using all annotations:

#!/usr/bin/env anduril

import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._

object Annotations {
  val input1 = INPUT(path = "data1.csv")
  input1._enabled = false

  val input2 = INPUT(path = "data2.csv")

  val sorted = CSVSort(input1.out, types = "Gene=string", _name = "mySorted")
  sorted._bind(input2)
  sorted._custom("cpu") = "4"
  sorted._execute = "once"
  sorted._filename("out", "mySortedValue.csv")
  sorted._priority = 1
}

_bind

Binding sets a non-data dependency from the from component to the current component. Regular workflow dependencies are based on passing data between components. This annotations allows pure control dependencies to be defined. Above, sorted can not be executed before input2 is succesfully executed, even though there is no data dependency between these components. For example, if data2.csv does not exist, sorted is not executed.

_custom

_custom is a key-value map that can contain arbitrary values. These are passed to the execution environment, in particular to wrapper scripts. In our example, we might have deployed Anduril in a cluster, and notify the environment that CSV sorting uses four threads.

_enabled

Setting _enabled = false disables the current component, and recursively any components that depend on the current component through non-optional port dependencies. This turns off parts of the workflow, but allows keeping the code in place. Notice that if the current component is connected to an optional input port of another component, the connection is removed but the other component is still enabled.

_execute

This annotation has three legal values: “changed” (default), “always”, “once”. They control when a component is re-executed if it has been already executed in a previous run. The value “changed” executes the component if its configuration has changed (or one of its dependencies has changed). The value “always” executes the component always; this can lead to greatly increased running times. The value “once” does not re-execute even if the configuration has changed, as long as the component has been succesfully executed once. This can be useful to avoid long running times after a minimal change to the component configuration.

_filename

Renames the output file of the given port.

_name

Explicitly assigns the component name in the workflow. This can be useful when constructing anonymous components that do not get names from the Scala code structure. See component naming for details. This annotation is used in the constructor of the component.

_priority

Modifies the execution priority of the component among other components in the workflow. The default priority is 0, and higher values mean higher priority that leads to earlier execution. Notice that execution order is still constrained by data dependencies: this annotation can prioritize certain samples in a data set, for example.

_keep

when keep=false, the output files of a component will be deleted after the successful execusion of the component and all downstream dependent componenets. This is useful for saving space by removing intermediate large files. If new dependent component is added downstream after the files are removed, it will trigger re-execution.