Up: Component summary Component

CSVSummary

Summarises values for rows of a file according to a column label or the whole file to single row. Fully non-numerical columns return either NA, Inf or -Inf based on which summary type is used.

Version 1.1
Bundle tools
Categories Analysis
Authors Ville Rantanen (ville.rantanen@helsinki.fi)
Issue tracker View/Report issues
Requires R
Source files component.xml Summarise.r
Usage Example with default values

Inputs

Name Type Mandatory Description
in CSV Mandatory CSV file with numerical columns and clustering label column.

Outputs

Name Type Description
out CSV csv file with summarised row for each unique label

Parameters

Name Type Default Description
clusterCol string "" The name of the cluster label column. If left empty, a single output line is created from all values.
counts string "Count" Include a count of rows in a column named after the value. If empty string, skip counting.
postString string "" String to add to each column name end, e.g. "_mean".
stringMode boolean false When a column numeric conversion fails, return the most common string instead of NA
summaryType string "mean" The summary type. Use mean, median, mode, var, sd, sum, IQR, mad, min, max, lhinge, uhinge or dimensions. The more exotic ones are: IQR(interquartile range) mad(median absolute deviation) lhinge/uhinge(lower/upper hinge aka Q1 and Q3) mode(most common value (returns the lowest)) dimensions(produces only the row and columns counts)

Test cases

Test case Parameters IN
in
OUT
out
case1 properties in out

summaryType=mean,
clusterCol=clusterId

case2 properties in out

summaryType=var,
clusterCol=clusterId,
stringMode=true,
counts=CountColumn

case3 properties in out

summaryType=mean,
counts=

case4_dimensions properties in out

summaryType=dimensions,
clusterCol=clusterId,
postString=_1

case5_onecluster properties in out

summaryType=mean,
clusterCol=clusterId,
stringMode=true


Generated 2019-02-08 07:42:16 by Anduril 2.0.0