Using a generated SuperSig, predict on a new dataset and return predicted probabilities for each observation.

predict_signature(object, newdata, factor)

Arguments

object

an object of class SuperSig

newdata

a data frame of mutations containing columns for sample_id, age, IndVar, and the 96 trinucleotide mutations (see vignette for details)

factor

the factor/exposure (e.g. "age", "smoking")

Value

predict_signature returns the original data frame with additional columns for the feature counts and classification score

Examples

head(example_dt) # use example data from package
#> sample_id age chromosome position ref alt #> 1 1 50 chr1 94447621 G C #> 2 1 50 chr2 202005395 A C #> 3 1 50 chr7 20784978 T A #> 4 1 50 chr7 87179255 C G #> 5 1 50 chr19 1059712 G T #> 6 2 55 chr1 76226977 T C
input_dt <- make_matrix(example_dt) # convert to correct format input_dt$IndVar <- c(1, 1, 1, 0, 0) # add IndVar column out <- get_signature(data = input_dt, factor = "Age") # get SuperSig
#> Begin feature engineering...
#> Begin cross-validated selection over 3 features and 15 inner folds...
#> ...testing inner fold 1
#> ...testing inner fold 2
#> ...testing inner fold 3
#> ...testing inner fold 4
#> ...testing inner fold 5
#> ...testing inner fold 6
#> ...testing inner fold 7
#> ...testing inner fold 8
#> ...testing inner fold 9
#> ...testing inner fold 10
#> ...testing inner fold 11
#> ...testing inner fold 12
#> ...testing inner fold 13
#> ...testing inner fold 14
#> ...testing inner fold 15
newdata <- predict_signature(out, newdata = input_dt, factor = "age") suppressPackageStartupMessages({library(dplyr)}) head(newdata %>% select(score))
#> # A tibble: 5 x 1 #> score #> <dbl> #> 1 0.6 #> 2 0.6 #> 3 0.6 #> 4 0.6 #> 5 0.6