Up: Component summary Component

MMClustering

Normal, t, Skew-normal, and skew-t Mixture-Model Clustering.

Version 1.0
Bundle flowand
Categories FlowCytometry
Authors Ali Oghabian (ali.oghabian@helsinki.fi)
Requires R, Rmpi
Source files component.xml MMClustering.r
Usage Example with default values

Inputs

Name Type Mandatory Description
fcsFiles CSVList Mandatory Location of the csv files to be clustered e.g. Preprocessed files in flow cytometry analysis .

Outputs

Name Type Description
clusters CSVList csv files including the input data and one additional column presenting the mixture modeling cluster results. The numbers added to the input filenames represent the g parameter number.
postProb CSVList Posterior probabilities of the data. For each file in fcsFiles generates a file with similar name ehich includes the psterior probability of each point of the data. The dimentions of each csv file is similar to dimentions of the clsuetred data i.e. (size of the ChannelsToCluster)*( Row number of the data).
report Latex Visualization of MixtureModeling results.
mmSpecs CSV The mmSpecs.csv file, include the information and calculations for the Mixture Modeling Clustering that are suitable for OptimalClustering component. The first column includes the file names and the other columns are the aic, bic and SWR calculations, which could be used to estimate the best clustering g parameter
mmParams BinaryFolder .txt files including parameters for the Mixture model results. They are in R object format, that have been presented in an R scripts text.

Parameters

Name Type Default Description
IDCol string "" The column representing the row numbers. The parameter is optional and it would be added to the "postProb" files, if provided.
NaRmove boolean true Whether remove the rows with na from the data or return an Error.
channelsToCluster string (no default) A comma-separated list of column names or column numbers (e.g. channel names/ numbers in flow cytometry analysis), to be clustered. As for instance the value could be set as "1, 2, 3, 4, 5" or "FSC.A, SSC, ERK1, STAT1, CD4" .
channelsToRegularPlot string "" A comma-separated list of column names or column numbers (e.g. channel names/ numbers in flow cytometry analysis), to be plotted. As for instance the value could be set as "1, 2, 3, 4, 5" or "FSC.A, SSC, ERK1, STAT1, CD4". This parameter is only for the "regular2d" and "regullar3d" plottings.
clusterIDColName string "cluster" The name of the column in the clusters output which contains the cluster IDs of the rows.
density string "skewt" Density distribution to be used for clustering: normal, t, skew normal, skew t.
epsilon float 0.0001 A value used for threshold in the mixture modeling fitting process. The smaller the value the more accurate the cluster results are. If the clusters do not satisfy the epsilon threshold in iterationMaximum iteration attempts an error would be returned.
estimateMode boolean false Used only for skew distributions. Whether to estimate the mode for each cluster. The defalt value (i.e. false) is recommended
gMax int 5 The maximum number of the "Mixture Model" clusters. It is considering that results for some clusters might be Null. As for instance, the results for MMclustering with g parameter (i.e. expected cluster number) of 5 might group the rows of the data in 4 clusters of 1,2,4,and 5.
gMin int 3 The minimum number of the Mixture Model clusters.It is considering that results for some clusters might be Null. As for instance, the results for MMclustering with g parameter (i.e. expected cluster number) of 5 might group the rows of the data in 4 clusters of 1,2,4,and 5.
includePosterior boolean true Whether include the Posterior probabilities. Check "the postProb" output for more information.
iterationMaximum int 5000 The maximum limitation for the number of iterations, in the mixturemodeling fitting process
nSlaves int 5 The number of the clusters used for paralelising the Mixture Modeling.
pagebreak boolean false Tells if the result document should start with a page break.
pch string "." The type and shape of plotting points. Numbers (e.g. "2") and characters (e.g. ".") are possible.
plotType string "boundry2d" The type of Plotting. The choices are regular2d, regular3d, contour2d, individualcontour2d, and boundry2d.
randomSeed int 1984 The seed value for semi-random processes
step float 0.5 Only used when Density is "skew" and the estimateMode is "true". It is used to calculated the number of the iterations of the Expected-Maximization (i.e. EM), while computing the Maximum Likelihood (i.e ML) estimate. The value should be greater than 0. The smaller the value, the more accurate the estimations would be.
writeImage boolean false Whether include the plotting images in the report directory.
writeParams boolean false Whether write the Mixture Model parameters.

Generated 2019-02-08 07:42:08 by Anduril 2.0.0