Component types

Before implementing a component, we first need to decide the type of the component.

Atomic and composite components

There are two kinds of components: atomic and composite. Atomic components are written in a language such as R, Python, Bash, Java or Matlab, and roughly correspond to specialized versions of external scripts. In contrast, composite components are written in Scala and correspond to reusable Scala functions in workflows. When an atomic component is used in a workflow, it places exactly one component (namely, itself) to the workflow. A composite component can flexibly place one or more components to the workflow, some of which recursively may place multiple components.

Implement an atomic component when:

Implement a composite component when:

As a general rule, if you would embed the logic of the component into a workflow using a single external script, it is a candidate for an atomic component. If you would encapsulate the logic in a Scala function in the workflow, a composite component is probably suitable.

Implementations of atomic component are under components/ in the bundle folder structure, and composite components are under functions/.

Selecting component type for our first component

To add content to the demo bundle, let’s implement a simplified CSV filter component that takes a CSV file as input, excludes certain columns, and writes a filtered CSV file as output. We name our component SimpleCSVFilter.

By evaluating the criteria above, SimpleCSVFilter is an atomic component because it is logically one unit and does not benefit from dividing functionality or workflow features.