External scripts
Click component name to see manual page
Component | Shorthand | Bundle | Description |
---|---|---|---|
BashEvaluate | bash"..." |
tools | Invoke Bash script |
PythonEvaluate | python"..." |
tools | Invoke Python script |
MatlabEvaluate | tools | Invoke Matlab script | |
QuickBash | tools | Invoke Bash script with fewer inputs | |
REvaluate | R"..." |
tools | Invoke R script |
ScalaEvaluate | scala"..." |
tools | Invoke Scala script |
TableQuery | sql"..." |
tools | Invoke SQL query on CSV files |
Overview
Anduril provides facilities to invoke scripts written in external languages. The components that provide this all have similar interfaces: they take a script and a number of data files as input, and produce a number of data files as output.
There are two ways to call these components. First, they can be invoked as regular components using the function call syntax of Scala. Second, they have shorthand syntax using Scala string interpolation that makes common use case more convenient. Regular invocation is needed if you want to place the external script in its own file or need to modify parameters; otherwise, shorthand syntax can be used.
The general form of the shorthand syntax is prefix"script"
or prefix"""long script"""
(for multi-line strings). Here, prefix
is a language-dependent identifier such as bash
or python
. You can use Scala variables inside the strings using ${variable}
: they are expanded to the value of the variable. Scala expressions such as ${component.port}
are also supported. When expanding output ports of components, Anduril also inserts dependencies to the workflow.
These components are in the tools bundle, so remember to put import anduril.tools._
in your scripts.
Bash
The script below shows three equivalent ways of filtering a CSV file using grep to obtain lines that contain “gene01”. The shorthand syntax bash"..."
expands the script inside the quotes using the variables grepArguments
and data
to produces a final command like grep gene01 /home/user/data/data.csv
. Dependencies are properly configured: filtered1
depends on data
in the workflow.
QuickBash is a simplified interface with fewer ports and is suitable when you have one input (visible in Bash as $in
) and one output ($out
). BashEvaluate is the verbose version of the shorthand syntax, in which parameter substitution is done using templates of the form @arg@
.
The multiple
component demonstrates chaining Bash calls together, having multiple statements in the script, and writing to multiple output ports.
#!/usr/bin/env anduril
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object Bash {
val data = INPUT("data.csv")
val grepArguments = "gene01"
val filtered1 = bash"grep ${grepArguments} ${data.out}"
val filtered2 = QuickBash(script = "grep " + grepArguments + " $in > $out",
in = data)
val filtered3 = BashEvaluate(script = "grep @param1@ @var1@",
var1 = data,
param1 = grepArguments)
val multiple = bash"""
cat ${filtered1.stdOut} ${filtered2.out} ${filtered3.stdOut} > @out1@
echo OK > @out2@"""
}
SQL
In the following example, we have two CSV files with columns Gene, Value and QualityOK, and want to compute the mean Value for genes that are present in both files and have QualityOK = 1. TableQuery and sql"..."
construct a temporary in-memory database that can be used for executing a query. We use the multi-line form of string interpolation with triple quotation marks. The sql
shorthand syntax expands file references like data1
into table names like table1
; we can rename them using AS
to be explicit. The default SQL engine is HSQLDB.
#!/usr/bin/env anduril
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object SQL {
val data1 = INPUT("data1.csv")
val data2 = INPUT("data2.csv")
val qualityCondition = 1
val joined = sql"""
SELECT data1."Gene",
(data1."Value"+data2."Value")/2 AS "MeanValue"
FROM ${data1} AS data1,
${data2} AS data2
WHERE data1."Gene" = data2."Gene"
AND data1."QualityOK" = ${qualityCondition}
AND data2."QualityOK" = ${qualityCondition}"""
}
Python
In this example, we compute the sum of Value columns in a CSV file, and write the result into a CSV file. PythonEvaluate provides access to an Anduril API that can be used for such tasks, and pre-populates certain variables. Alternatively, the Python standard library (such as csv
) or external libraries can be used.
Here, we chose to place the Python script into an external file and invoke PythonEvaluate using the verbose syntax. The Python script is:
import anduril
value_sum = 0
for row in table1:
value_sum += row['Value']
tableout.set_fieldnames(['ValueSum'])
tableout.writerow([value_sum])
Workflow configuration:
#!/usr/bin/env anduril
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object Python {
val data = INPUT("data.csv")
val script = INPUT("script.py")
val result = PythonEvaluate(scriptIn = script, table1 = data)
}
We can also use the shorthand syntax. Here, we have to be take care of proper indentation, because Python uses whitespace to mark code structure. The python"..."
syntax supports the Scala stripMargin
feature, in which whitespace before initial |
is removed. Note that this feature is specific to the Python shorthand syntax. Also, the data file is imported as a generic file, not a CSV table, so we manually construct a CSV iterator.
#!/usr/bin/env anduril
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object PythonInline {
val data = INPUT("data.csv")
val result = python"""|
|import anduril
|value_sum = 0
|for row in anduril.TableReader(${data}):
| value_sum += row['Value']
|tableout.set_fieldnames(['ValueSum'])
|tableout.writerow([value_sum])"""
}