Calculate centroids from expression data with ClaNC
Usage
clanc(x, ...)
# Default S3 method
clanc(x, ...)
# S3 method for class 'data.frame'
clanc(x, classes, active, priors = "equal", ...)
# S3 method for class 'matrix'
clanc(x, classes, active, priors = "equal", ...)
# S3 method for class 'SummarizedExperiment'
clanc(x, classes, active, priors = "equal", assay = 1, ...)
# S3 method for class 'ExpressionSet'
clanc(x, classes, active, priors = "equal", ...)
# S3 method for class 'formula'
clanc(formula, data, active, priors = "equal", ...)
# S3 method for class 'recipe'
clanc(x, data, active, priors = "equal", ...)
Arguments
- x
Depending on the context:
A data frame of expression.
A matrix of expression.
A recipe specifying a set of preprocessing steps created from
recipes::recipe()
.An ExpressionSet.
A SummarizedExperiment with
assay
containing expression.
Expression should be library-size corrected, but not scaled.
If supplying a data frame, matrix, ExpressionSet, SummarizedExperiment, the rows should represent genes, and the columns should represent samples (as is standard for expression data). The column names should be sample IDs, while the row names should be gene IDs.
If a recipe is provided, the data should have genes as columns (to match the formula provided to the recipe.)
- ...
Not currently used, but required for extensibility.
- classes
When
x
is a data frame or matrix,class
contains class labels with the form of either:A data frame with 1 factor column
A factor vector.
When
x
is an ExpressionSet or SummarizedExperiment,class
is the name of the column inpData(x)
orcolData(x)
that contains classes as a factor.- active
Either a single number or a numeric vector equal to the length of the number of unique class labels. Represents the number class-specific genes that should be selected for a centroid. Note that different numbers of genes can be selected for each class. See details.
When
x
is an ExpressionSet or SummarizedExperiment,active
can additionally by the name of the column inpData(x)
orcolData(x)
that contains the numeric vector- priors
Can take a variety of values:
"equal" - each class has an equal prior
"class" - each class has a prior equal to its frequency in the training set
A numeric vector with length equal to number of classes
When
x
is an ExpressionSet or SummarizedExperiment,active
can additionally by the name of the column inpData(x)
orcolData(x)
that contains the numeric vector- assay
When a SummarizedExperiment is used, the index or name of the assay
- formula
A formula specifying the classes on the left-hand side, and the predictor terms on the right-hand side.
- data
When a recipe or formula is used,
data
is specified as:A data frame containing both expression and classes, where columns are the genes or class, and rows are the samples.
Details
The original description of ClaNC can be found here
While active
sets the number of class-specific genes, each centroid will
have more than that number of genes. To explain by way of example, if active = 5
and there are 3 classes, each centroid will have 15 genes, with 5 of
those genes being particular to a given class. If these genes are 'active' in
that class, their values will be the mean of the class. If the genes are not
active in that given class, their values will be the overall expression of
the given gene across all classes.
Examples
expression_matrix <- synthetic_expression$expression
head(expression_matrix)
#> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
#> gene1 8.097529 7.119188 7.304400 7.554689 7.953206 7.714925 7.512700 8.597547
#> gene2 8.641837 9.400416 8.500865 8.878687 8.318438 8.728683 7.812591 7.638167
#> gene3 3.436236 4.317915 3.435193 3.515755 3.024976 4.762209 5.048956 2.006646
#> gene4 4.368008 5.212750 4.618249 4.201365 3.195294 4.707750 5.126769 6.178658
#> gene5 2.423974 3.563816 4.062362 2.163278 2.021435 2.813873 0.000000 4.652358
#> gene6 5.371205 5.919809 4.366915 4.805534 4.834856 5.622157 3.883531 3.593082
#> sample9 sample10 sample11 sample12
#> gene1 6.475641 7.648858 8.637526 7.345038
#> gene2 8.110285 7.906104 7.424728 7.927039
#> gene3 2.739211 3.111668 3.161077 4.306611
#> gene4 5.170265 4.259578 5.872855 6.159023
#> gene5 1.532242 3.399823 3.691250 1.932937
#> gene6 4.246205 4.637316 3.575837 2.730452
classes <- synthetic_expression$classes
classes
#> [1] A A A A A A B B B B B B
#> Levels: A B
# data.frame/tibble/matrix interface:
clanc(expression_matrix, classes = classes, active = 5, priors = "equal")
#> <clanc>
#> $centroids
#> class gene expression pooled_sd active prior
#> 1 A gene12 7.514718 0.4779155 5 0.5
#> 2 A gene2 8.744821 0.3147537 5 0.5
#> 3 A gene13 8.936462 0.3418472 5 0.5
#> 4 A gene21 6.584681 0.5279636 5 0.5
#> 5 A gene24 4.307301 0.7214700 5 0.5
#> 6 A gene74 4.028507 0.4940783 5 0.5
#> 7 A gene41 4.328516 0.6317005 5 0.5
#> 8 A gene95 6.873184 0.4462475 5 0.5
#> 9 A gene52 3.743798 0.5173769 5 0.5
#> 10 A gene66 7.008174 0.5883218 5 0.5
#> 11 B gene12 8.072284 0.4779155 5 0.5
#> 12 B gene13 9.938137 0.3418472 5 0.5
#> 13 B gene2 8.273987 0.3147537 5 0.5
#> 14 B gene24 3.370467 0.7214700 5 0.5
#> 15 B gene21 5.789423 0.5279636 5 0.5
#> 16 B gene41 5.518354 0.6317005 5 0.5
#> 17 B gene74 3.226598 0.4940783 5 0.5
#> 18 B gene52 2.438579 0.5173769 5 0.5
#> 19 B gene95 6.288173 0.4462475 5 0.5
#> 20 B gene66 7.891588 0.5883218 5 0.5
#>
# Formula interface:
# Data must have class included as a column
# Genes must be *columns* and samples must be *rows*
# Hence the data transposition.
for_formula <- data.frame(class = classes, t(expression_matrix))
clanc(class ~ ., for_formula, active = 5, priors = "equal")
#> <clanc>
#> $centroids
#> class gene expression pooled_sd active prior
#> 1 A gene12 7.514718 0.4779155 5 0.5
#> 2 A gene2 8.744821 0.3147537 5 0.5
#> 3 A gene13 8.936462 0.3418472 5 0.5
#> 4 A gene21 6.584681 0.5279636 5 0.5
#> 5 A gene24 4.307301 0.7214700 5 0.5
#> 6 A gene74 4.028507 0.4940783 5 0.5
#> 7 A gene41 4.328516 0.6317005 5 0.5
#> 8 A gene95 6.873184 0.4462475 5 0.5
#> 9 A gene52 3.743798 0.5173769 5 0.5
#> 10 A gene66 7.008174 0.5883218 5 0.5
#> 11 B gene12 8.072284 0.4779155 5 0.5
#> 12 B gene13 9.938137 0.3418472 5 0.5
#> 13 B gene2 8.273987 0.3147537 5 0.5
#> 14 B gene24 3.370467 0.7214700 5 0.5
#> 15 B gene21 5.789423 0.5279636 5 0.5
#> 16 B gene41 5.518354 0.6317005 5 0.5
#> 17 B gene74 3.226598 0.4940783 5 0.5
#> 18 B gene52 2.438579 0.5173769 5 0.5
#> 19 B gene95 6.288173 0.4462475 5 0.5
#> 20 B gene66 7.891588 0.5883218 5 0.5
#>
# Recipes interface:
rec <- recipes::recipe(class ~ ., data = for_formula)
clanc(rec, for_formula, active = 5, priors = "equal")
#> <clanc>
#> $centroids
#> class gene expression pooled_sd active prior
#> 1 A gene12 7.514718 0.4779155 5 0.5
#> 2 A gene2 8.744821 0.3147537 5 0.5
#> 3 A gene13 8.936462 0.3418472 5 0.5
#> 4 A gene21 6.584681 0.5279636 5 0.5
#> 5 A gene24 4.307301 0.7214700 5 0.5
#> 6 A gene74 4.028507 0.4940783 5 0.5
#> 7 A gene41 4.328516 0.6317005 5 0.5
#> 8 A gene95 6.873184 0.4462475 5 0.5
#> 9 A gene52 3.743798 0.5173769 5 0.5
#> 10 A gene66 7.008174 0.5883218 5 0.5
#> 11 B gene12 8.072284 0.4779155 5 0.5
#> 12 B gene13 9.938137 0.3418472 5 0.5
#> 13 B gene2 8.273987 0.3147537 5 0.5
#> 14 B gene24 3.370467 0.7214700 5 0.5
#> 15 B gene21 5.789423 0.5279636 5 0.5
#> 16 B gene41 5.518354 0.6317005 5 0.5
#> 17 B gene74 3.226598 0.4940783 5 0.5
#> 18 B gene52 2.438579 0.5173769 5 0.5
#> 19 B gene95 6.288173 0.4462475 5 0.5
#> 20 B gene66 7.891588 0.5883218 5 0.5
#>
# SummarizedExperiment interface:
se <- SummarizedExperiment::SummarizedExperiment(
expression_matrix,
colData = data.frame(
class = classes,
active = 5,
prior = c(0.5, 0.5)
)
)
clanc(se, classes = "class", active = "active", priors = "equal")
#> <clanc>
#> $centroids
#> class gene expression pooled_sd active prior
#> 1 A gene12 7.514718 0.4779155 5 0.5
#> 2 A gene2 8.744821 0.3147537 5 0.5
#> 3 A gene13 8.936462 0.3418472 5 0.5
#> 4 A gene21 6.584681 0.5279636 5 0.5
#> 5 A gene24 4.307301 0.7214700 5 0.5
#> 6 A gene74 4.028507 0.4940783 5 0.5
#> 7 A gene41 4.328516 0.6317005 5 0.5
#> 8 A gene95 6.873184 0.4462475 5 0.5
#> 9 A gene52 3.743798 0.5173769 5 0.5
#> 10 A gene66 7.008174 0.5883218 5 0.5
#> 11 B gene12 8.072284 0.4779155 5 0.5
#> 12 B gene13 9.938137 0.3418472 5 0.5
#> 13 B gene2 8.273987 0.3147537 5 0.5
#> 14 B gene24 3.370467 0.7214700 5 0.5
#> 15 B gene21 5.789423 0.5279636 5 0.5
#> 16 B gene41 5.518354 0.6317005 5 0.5
#> 17 B gene74 3.226598 0.4940783 5 0.5
#> 18 B gene52 2.438579 0.5173769 5 0.5
#> 19 B gene95 6.288173 0.4462475 5 0.5
#> 20 B gene66 7.891588 0.5883218 5 0.5
#>
# ExpressionSet interface:
adf <- data.frame(
row.names = colnames(expression_matrix),
class = classes
) |>
Biobase::AnnotatedDataFrame()
es <- Biobase::ExpressionSet(expression_matrix, adf)
clanc(es, classes = "class", active = 5, priors = 0.5)
#> <clanc>
#> $centroids
#> class gene expression pooled_sd active prior
#> 1 A gene12 7.514718 0.4779155 5 0.5
#> 2 A gene2 8.744821 0.3147537 5 0.5
#> 3 A gene13 8.936462 0.3418472 5 0.5
#> 4 A gene21 6.584681 0.5279636 5 0.5
#> 5 A gene24 4.307301 0.7214700 5 0.5
#> 6 A gene74 4.028507 0.4940783 5 0.5
#> 7 A gene41 4.328516 0.6317005 5 0.5
#> 8 A gene95 6.873184 0.4462475 5 0.5
#> 9 A gene52 3.743798 0.5173769 5 0.5
#> 10 A gene66 7.008174 0.5883218 5 0.5
#> 11 B gene12 8.072284 0.4779155 5 0.5
#> 12 B gene13 9.938137 0.3418472 5 0.5
#> 13 B gene2 8.273987 0.3147537 5 0.5
#> 14 B gene24 3.370467 0.7214700 5 0.5
#> 15 B gene21 5.789423 0.5279636 5 0.5
#> 16 B gene41 5.518354 0.6317005 5 0.5
#> 17 B gene74 3.226598 0.4940783 5 0.5
#> 18 B gene52 2.438579 0.5173769 5 0.5
#> 19 B gene95 6.288173 0.4462475 5 0.5
#> 20 B gene66 7.891588 0.5883218 5 0.5
#>