Component to run PyClone variant clustering tool. Component has been tested with PyClone version 0.13.0.
See: https://bitbucket.org/aroth85/pyclone/wiki/Home for more information. Always cite Roth et al. PyClone: statistical inference of clonal population structure in cancer PMID: 24633410 if you use PyClone.
Config file (config.yaml) is generated, mutation files are built and PyClone analysis is run. Optionally, the outputs of the clusterig results are output.
Step1: Mutation files are generated with command: PyClone build_mutations_file. Prior for this command is defined with mutationPrior parameter. Step2: PyClone analysis is run with command: PyClone run_analysis. Seed value for the run is defined in parameter seed. Step3: Optional: Clustering results are generated in a table format if clusterTables is true. Wanted output types are defined with a comma-separated list in clusterTableType parameter.
Version | 0.1 |
---|---|
Bundle | sequencing |
Categories | Analysis |
Specialties | generic |
Authors | Mikko Kivikoski (mikko.kivikoski@helsinki.fi) |
Issue tracker | View/Report issues |
Requires | PyClone |
Source files | component.xml pyclone_run.sh |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
in | Array<T1> (generic) | Mandatory | Array of variant tables in tab-separated format. |
purity | CSV | Optional | Optional. Tumor purity estimates for each sample in a two column csv file. Column 'Key' must match to the Key in the input array. Column 'Purity' is the purity estimate for the sample. This overrides the purityDefault parameter. |
Name | Type | Description |
---|---|---|
config | YAML | Output port for the config file. |
trace | BinaryFolder | Output folder for trace files. |
mutationFiles | Array<YAML> | Mutation prior files in yaml format. |
clusteringResults | CSVList | Results of clustering, if executed. |
Name | Type | Default | Description |
---|---|---|---|
alpha | int | 1 | Alpha value |
beta | int | 1 | Beta value |
burnin | int | 5 | Number of MCMC samples to discard from the beginning. Prior to convergence, the MCMC series features an initial transient which is controlled by the initial parameters and not the data. This initial transient should be specified such that it can be discarded from the subsequent analysis. |
clusterTableType | string | "cluster,loci,old_style" | Comma-separated list of wanted clustering tables. Possible options: cluster,loci and old_style. Default: all. |
clusterTables | boolean | false | Boolean, default = false. If true, PyClone clustering results are produced by using PyClone build_table command. |
concentration | float | 1.0 | Concentration parameter |
densityFunction | string | "pyclone_beta_binomial" | Density function |
initMethod | string | "disconnected" | Initial clustering. The value "connected" starts with all loci in a single cluster, subsequent iterations likely splitting the clusters; while "disconnected" starts with each loci in a separate cluster, iterations likely merging them. The former can be beneficial for large datasets. Default: "disconnected". |
iterations | int | 50 | Number of iterations used. |
meshSize | int | 101 | Number of mesh points for density estimation. The default of 101 allows subdivision of 1% in cellular prevalences. |
mutationPrior | string | "major_copy_number" | Mutation prior for building mutation files |
oldFormat | string | "true" | Specifies if the outputs should be converted to the old format when using PyClone 0.13.1 and newer. The cluster labels and trace files are off by one. Default: true |
purityDefault | float | 1.0 | Tumor purity estimate. Parameter's value is used as tumor purity estimate for all samples. Purity input overrides this parameter. |
rate | float | 0.001 | Rate parameter |
seed | string | "3" | Seed value for analysis. |
shape | float | 1.0 | Shape parameter |
tableName | string | "" | String to be added as a prefix to the file names of clustering results. The prefix will be separate with hyphen. Default: "" |
thin | int | 1 | Thinning ratio for the MCMC series in samples. In an MCMC series, the consecutive samples are correlated. These correlations will skew higher-order statistics such as variance estimates. This can be mitigated by specifying T such that each T-sample string is thinned into a single sample. |
Test case | Parameters▼ | IN in |
IN purity |
OUT config |
OUT trace |
OUT mutationFiles |
OUT clusteringResults |
---|---|---|---|---|---|---|---|
buildClusterTables | properties | in | (missing) | config | trace | mutationFiles | clusteringResults |
seed=7, |
|||||||
default | (missing) | in | (missing) | config | trace | mutationFiles | (missing) |