10. Reference: Component naming
When using Scala to insert components on a workflow, you need to follow basic code patterns that ensure that each component gets a consistent and human-readable name in the workflow. This allows tracing components between Scala code and workflow results (log messages and execution folder), and is required by Anduril to correctly determine if a component has been changed and needs to be re-executed.
Summary of supported Scala patterns:
Code | Component name(s) |
---|---|
val simple = INPUT("data.csv") |
simple |
val map = NamedMap[INPUT]("parent") map("child") = INPUT("data.csv") |
parent_child |
val seq = NamedSeq[INPUT]("parent") map += INPUT("data.csv") |
parent_0 |
withName("parent") { val child = INPUT("data.csv") } |
parent-child |
val parent = CSVSort(INPUT("data.csv")) |
parent, parent_in |
def f(x) = { val child = INPUT("data.csv") } val parent = f() |
parent-child |
def f(x) = { val child = INPUT("data.csv") } f() |
child |
INPUT("data.csv", _name = "explicit") |
explicit |
These patterns are illustrated in the following code:
#!/usr/bin/env anduril
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object ComponentNaming {
// 1. Name: data
val data = INPUT(path = "data.csv")
val mySequence = NamedSeq[INPUT]("dataSeq")
val myMap = NamedMap[INPUT]("dataMap")
for (key <- Seq("data1", "data2")) {
withName(key) {
// 2. Names: data1-encapsulated, data2-encapsulated
val encapsulated = INPUT(path = key+".csv")
}
// 3. (Seq) Names: dataSeq_0, dataSeq_1
mySequence += INPUT(path = key+".csv")
// 3. (Map) Names: dataMap_data1, dataMap_data2
myMap(key) = INPUT(path = key+".csv")
}
// 4. Names: embedded (CSVSort), embedded-in (INPUT)
val embedded = CSVSort(
INPUT(path = "data.csv")
)
// 5. Name: explicitName
CSVSort(data, _name = "explicitName")
// 6. Name: fromFunction-sorted
def subFunction() = {
val sorted = CSVSort(data)
// Or any other pattern from above
}
val fromFunction = subFunction()
}
Explanation of Scala patterns
Basic pattern: val
The most basic pattern in val
. The component constructor val name =
Component(
must be on the same line as val
, but the argument list can
extend over multiple lines.
Iteration: NamedMap, NamedSeq, withName
NamedMap,
NamedSeq and withName
are used in iterative structures.
Embedded calls
Embedded calls like f(g(x))
are supported supported to one level deep.
g(x)
does not have to be on the same line as f
. f(g(h(x)))
(two levels)
is not supported.
Function calls
Arbitrarily nested function calls are supported, and result in names like parent1-parent2-child. Inside functions, the hierarchical prefix (such as parent1-parent2) is inserted to all generated names. All supported naming patterns are available in functions.
Explicit naming
The _name
annotation can be used in the component constructor to assign
explicit names. This always gives proper names, but should be avoided due to
the manual work involved.
Invalid solutions
The following code demonstrates some antipatterns that do not result in consistent names. The problems are:
- Orphan component that has no
val
orvar
, or_name
annotation. Correcting: useval
or_name
. val
is on a different line than component definitions. Correcting: use_name
or fit on one line.- Name is reused in a
for
loop. Correcting: usewithName
,NamedSeq
orNamedMap
. - Component is inserted into a plain Scala collection (
Seq
,Map
, etc.). Correcting: useNamedSeq
orNamedMap
.
#!/usr/bin/env anduril
import anduril.builtin._
import anduril.tools._
import org.anduril.runtime._
object ComponentNamingBad {
// 1. BAD EXAMPLE: orphan
INPUT(path = "data.csv")
// 2. BAD EXAMPLE: different line
val flag = true
val conditional = if (flag) {
INPUT(path = "data1.csv")
} else {
INPUT(path = "data2.csv")
}
// 3. BAD EXAMPLE: name reused
for (key <- Seq("data1", "data2")) {
val data = INPUT(path = key+".csv")
}
// 4. BAD EXAMPLE: plain collection
val plainSeq = scala.collection.mutable.Seq[INPUT]
for (key <- Seq("data1", "data2")) {
plainSeq += INPUT(path = key+".csv")
}
}