Skip to contents

Calculate centroids from expression data with ClaNC

Usage

clanc(x, ...)

# Default S3 method
clanc(x, ...)

# S3 method for class 'data.frame'
clanc(x, classes, active, priors = "equal", ...)

# S3 method for class 'matrix'
clanc(x, classes, active, priors = "equal", ...)

# S3 method for class 'SummarizedExperiment'
clanc(x, classes, active, priors = "equal", assay = 1, ...)

# S3 method for class 'ExpressionSet'
clanc(x, classes, active, priors = "equal", ...)

# S3 method for class 'formula'
clanc(formula, data, active, priors = "equal", ...)

# S3 method for class 'recipe'
clanc(x, data, active, priors = "equal", ...)

Arguments

x

Depending on the context:

  • A data frame of expression.

  • A matrix of expression.

  • A recipe specifying a set of preprocessing steps created from recipes::recipe().

  • An ExpressionSet.

  • A SummarizedExperiment with assay containing expression.

Expression should be library-size corrected, but not scaled.

If supplying a data frame, matrix, ExpressionSet, SummarizedExperiment, the rows should represent genes, and the columns should represent samples (as is standard for expression data). The column names should be sample IDs, while the row names should be gene IDs.

If a recipe is provided, the data should have genes as columns (to match the formula provided to the recipe.)

...

Not currently used, but required for extensibility.

classes

When x is a data frame or matrix, class contains class labels with the form of either:

  • A data frame with 1 factor column

  • A factor vector.

When x is an ExpressionSet or SummarizedExperiment, class is the name of the column in pData(x) or colData(x) that contains classes as a factor.

active

Either a single number or a numeric vector equal to the length of the number of unique class labels. Represents the number class-specific genes that should be selected for a centroid. Note that different numbers of genes can be selected for each class. See details.

When x is an ExpressionSet or SummarizedExperiment, active can additionally by the name of the column in pData(x) or colData(x) that contains the numeric vector

priors

Can take a variety of values:

  • "equal" - each class has an equal prior

  • "class" - each class has a prior equal to its frequency in the training set

  • A numeric vector with length equal to number of classes

When x is an ExpressionSet or SummarizedExperiment, active can additionally by the name of the column in pData(x) or colData(x) that contains the numeric vector

assay

When a SummarizedExperiment is used, the index or name of the assay

formula

A formula specifying the classes on the left-hand side, and the predictor terms on the right-hand side.

data

When a recipe or formula is used, data is specified as:

  • A data frame containing both expression and classes, where columns are the genes or class, and rows are the samples.

Value

A clanc object.

Details

The original description of ClaNC can be found here

While active sets the number of class-specific genes, each centroid will have more than that number of genes. To explain by way of example, if active = 5 and there are 3 classes, each centroid will have 15 genes, with 5 of those genes being particular to a given class. If these genes are 'active' in that class, their values will be the mean of the class. If the genes are not active in that given class, their values will be the overall expression of the given gene across all classes.

Examples


expression_matrix <- synthetic_expression$expression
head(expression_matrix)
#>        sample1  sample2  sample3  sample4  sample5  sample6  sample7  sample8
#> gene1 8.097529 7.119188 7.304400 7.554689 7.953206 7.714925 7.512700 8.597547
#> gene2 8.641837 9.400416 8.500865 8.878687 8.318438 8.728683 7.812591 7.638167
#> gene3 3.436236 4.317915 3.435193 3.515755 3.024976 4.762209 5.048956 2.006646
#> gene4 4.368008 5.212750 4.618249 4.201365 3.195294 4.707750 5.126769 6.178658
#> gene5 2.423974 3.563816 4.062362 2.163278 2.021435 2.813873 0.000000 4.652358
#> gene6 5.371205 5.919809 4.366915 4.805534 4.834856 5.622157 3.883531 3.593082
#>        sample9 sample10 sample11 sample12
#> gene1 6.475641 7.648858 8.637526 7.345038
#> gene2 8.110285 7.906104 7.424728 7.927039
#> gene3 2.739211 3.111668 3.161077 4.306611
#> gene4 5.170265 4.259578 5.872855 6.159023
#> gene5 1.532242 3.399823 3.691250 1.932937
#> gene6 4.246205 4.637316 3.575837 2.730452
classes <- synthetic_expression$classes
classes
#>  [1] A A A A A A B B B B B B
#> Levels: A B

# data.frame/tibble/matrix interface:

clanc(expression_matrix, classes = classes, active = 5, priors = "equal")
#> <clanc> 
#> $centroids
#>    class   gene expression pooled_sd active prior
#> 1      A gene12   7.514718 0.4779155      5   0.5
#> 2      A  gene2   8.744821 0.3147537      5   0.5
#> 3      A gene13   8.936462 0.3418472      5   0.5
#> 4      A gene21   6.584681 0.5279636      5   0.5
#> 5      A gene24   4.307301 0.7214700      5   0.5
#> 6      A gene74   4.028507 0.4940783      5   0.5
#> 7      A gene41   4.328516 0.6317005      5   0.5
#> 8      A gene95   6.873184 0.4462475      5   0.5
#> 9      A gene52   3.743798 0.5173769      5   0.5
#> 10     A gene66   7.008174 0.5883218      5   0.5
#> 11     B gene12   8.072284 0.4779155      5   0.5
#> 12     B gene13   9.938137 0.3418472      5   0.5
#> 13     B  gene2   8.273987 0.3147537      5   0.5
#> 14     B gene24   3.370467 0.7214700      5   0.5
#> 15     B gene21   5.789423 0.5279636      5   0.5
#> 16     B gene41   5.518354 0.6317005      5   0.5
#> 17     B gene74   3.226598 0.4940783      5   0.5
#> 18     B gene52   2.438579 0.5173769      5   0.5
#> 19     B gene95   6.288173 0.4462475      5   0.5
#> 20     B gene66   7.891588 0.5883218      5   0.5
#> 

# Formula interface:

# Data must have class included as a column
# Genes must be *columns* and samples must be *rows*
# Hence the data transposition.
for_formula <- data.frame(class = classes, t(expression_matrix))

clanc(class ~ ., for_formula, active = 5, priors = "equal")
#> <clanc> 
#> $centroids
#>    class   gene expression pooled_sd active prior
#> 1      A gene12   7.514718 0.4779155      5   0.5
#> 2      A  gene2   8.744821 0.3147537      5   0.5
#> 3      A gene13   8.936462 0.3418472      5   0.5
#> 4      A gene21   6.584681 0.5279636      5   0.5
#> 5      A gene24   4.307301 0.7214700      5   0.5
#> 6      A gene74   4.028507 0.4940783      5   0.5
#> 7      A gene41   4.328516 0.6317005      5   0.5
#> 8      A gene95   6.873184 0.4462475      5   0.5
#> 9      A gene52   3.743798 0.5173769      5   0.5
#> 10     A gene66   7.008174 0.5883218      5   0.5
#> 11     B gene12   8.072284 0.4779155      5   0.5
#> 12     B gene13   9.938137 0.3418472      5   0.5
#> 13     B  gene2   8.273987 0.3147537      5   0.5
#> 14     B gene24   3.370467 0.7214700      5   0.5
#> 15     B gene21   5.789423 0.5279636      5   0.5
#> 16     B gene41   5.518354 0.6317005      5   0.5
#> 17     B gene74   3.226598 0.4940783      5   0.5
#> 18     B gene52   2.438579 0.5173769      5   0.5
#> 19     B gene95   6.288173 0.4462475      5   0.5
#> 20     B gene66   7.891588 0.5883218      5   0.5
#> 


# Recipes interface:

rec <- recipes::recipe(class ~ ., data = for_formula)

clanc(rec, for_formula, active = 5, priors = "equal")
#> <clanc> 
#> $centroids
#>    class   gene expression pooled_sd active prior
#> 1      A gene12   7.514718 0.4779155      5   0.5
#> 2      A  gene2   8.744821 0.3147537      5   0.5
#> 3      A gene13   8.936462 0.3418472      5   0.5
#> 4      A gene21   6.584681 0.5279636      5   0.5
#> 5      A gene24   4.307301 0.7214700      5   0.5
#> 6      A gene74   4.028507 0.4940783      5   0.5
#> 7      A gene41   4.328516 0.6317005      5   0.5
#> 8      A gene95   6.873184 0.4462475      5   0.5
#> 9      A gene52   3.743798 0.5173769      5   0.5
#> 10     A gene66   7.008174 0.5883218      5   0.5
#> 11     B gene12   8.072284 0.4779155      5   0.5
#> 12     B gene13   9.938137 0.3418472      5   0.5
#> 13     B  gene2   8.273987 0.3147537      5   0.5
#> 14     B gene24   3.370467 0.7214700      5   0.5
#> 15     B gene21   5.789423 0.5279636      5   0.5
#> 16     B gene41   5.518354 0.6317005      5   0.5
#> 17     B gene74   3.226598 0.4940783      5   0.5
#> 18     B gene52   2.438579 0.5173769      5   0.5
#> 19     B gene95   6.288173 0.4462475      5   0.5
#> 20     B gene66   7.891588 0.5883218      5   0.5
#> 

# SummarizedExperiment interface:
se <- SummarizedExperiment::SummarizedExperiment(
  expression_matrix,
  colData = data.frame(
    class = classes,
    active = 5,
    prior = c(0.5, 0.5)
  )
)

clanc(se, classes = "class", active = "active", priors = "equal")
#> <clanc> 
#> $centroids
#>    class   gene expression pooled_sd active prior
#> 1      A gene12   7.514718 0.4779155      5   0.5
#> 2      A  gene2   8.744821 0.3147537      5   0.5
#> 3      A gene13   8.936462 0.3418472      5   0.5
#> 4      A gene21   6.584681 0.5279636      5   0.5
#> 5      A gene24   4.307301 0.7214700      5   0.5
#> 6      A gene74   4.028507 0.4940783      5   0.5
#> 7      A gene41   4.328516 0.6317005      5   0.5
#> 8      A gene95   6.873184 0.4462475      5   0.5
#> 9      A gene52   3.743798 0.5173769      5   0.5
#> 10     A gene66   7.008174 0.5883218      5   0.5
#> 11     B gene12   8.072284 0.4779155      5   0.5
#> 12     B gene13   9.938137 0.3418472      5   0.5
#> 13     B  gene2   8.273987 0.3147537      5   0.5
#> 14     B gene24   3.370467 0.7214700      5   0.5
#> 15     B gene21   5.789423 0.5279636      5   0.5
#> 16     B gene41   5.518354 0.6317005      5   0.5
#> 17     B gene74   3.226598 0.4940783      5   0.5
#> 18     B gene52   2.438579 0.5173769      5   0.5
#> 19     B gene95   6.288173 0.4462475      5   0.5
#> 20     B gene66   7.891588 0.5883218      5   0.5
#> 

# ExpressionSet interface:
adf <- data.frame(
  row.names = colnames(expression_matrix),
  class = classes
) |>
  Biobase::AnnotatedDataFrame()

es <- Biobase::ExpressionSet(expression_matrix, adf)
clanc(es, classes = "class", active = 5, priors = 0.5)
#> <clanc> 
#> $centroids
#>    class   gene expression pooled_sd active prior
#> 1      A gene12   7.514718 0.4779155      5   0.5
#> 2      A  gene2   8.744821 0.3147537      5   0.5
#> 3      A gene13   8.936462 0.3418472      5   0.5
#> 4      A gene21   6.584681 0.5279636      5   0.5
#> 5      A gene24   4.307301 0.7214700      5   0.5
#> 6      A gene74   4.028507 0.4940783      5   0.5
#> 7      A gene41   4.328516 0.6317005      5   0.5
#> 8      A gene95   6.873184 0.4462475      5   0.5
#> 9      A gene52   3.743798 0.5173769      5   0.5
#> 10     A gene66   7.008174 0.5883218      5   0.5
#> 11     B gene12   8.072284 0.4779155      5   0.5
#> 12     B gene13   9.938137 0.3418472      5   0.5
#> 13     B  gene2   8.273987 0.3147537      5   0.5
#> 14     B gene24   3.370467 0.7214700      5   0.5
#> 15     B gene21   5.789423 0.5279636      5   0.5
#> 16     B gene41   5.518354 0.6317005      5   0.5
#> 17     B gene74   3.226598 0.4940783      5   0.5
#> 18     B gene52   2.438579 0.5173769      5   0.5
#> 19     B gene95   6.288173 0.4462475      5   0.5
#> 20     B gene66   7.891588 0.5883218      5   0.5
#>