Summarises values for rows of a file according to a column label or the whole file to single row. Fully non-numerical columns return either NA, Inf or -Inf based on which summary type is used.
Version | 1.1 |
---|---|
Bundle | tools |
Categories | Analysis |
Authors | Ville Rantanen (ville.rantanen@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | R |
Source files | component.xml Summarise.r |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | CSV | Mandatory | CSV file with numerical columns and clustering label column. |
Name | Type | Description |
---|---|---|
out | CSV | csv file with summarised row for each unique label |
Name | Type | Default | Description |
---|---|---|---|
clusterCol | string | "" | The name of the cluster label column. If left empty, a single output line is created from all values. |
counts | string | "Count" | Include a count of rows in a column named after the value. If empty string, skip counting. |
postString | string | "" | String to add to each column name end, e.g. "_mean". |
stringMode | boolean | false | When a column numeric conversion fails, return the most common string instead of NA |
summaryType | string | "mean" | The summary type. Use mean, median, mode, var, sd, sum, IQR, mad, min, max, lhinge, uhinge or dimensions. The more exotic ones are: IQR(interquartile range) mad(median absolute deviation) lhinge/uhinge(lower/upper hinge aka Q1 and Q3) mode(most common value (returns the lowest)) dimensions(produces only the row and columns counts) |
Test case | Parameters▼ | IN in |
OUT out |
|||
---|---|---|---|---|---|---|
case1 | properties | in | out | |||
summaryType=mean, |
||||||
case2 | properties | in | out | |||
summaryType=var, |
||||||
case3 | properties | in | out | |||
summaryType=mean, |
||||||
case4_dimensions | properties | in | out | |||
summaryType=dimensions, |
||||||
case5_onecluster | properties | in | out | |||
summaryType=mean, |