Generate a tissue-specific SuperSig for a given dataset of mutations and exposure factor. Returns the SuperSig and a classification model trained with the SuperSig.

get_signature(data, factor, wgs = FALSE)

Arguments

data

a data frame of mutations containing columns for sample_id, age, IndVar, and the 96 trinucleotide mutations (see vignette for details)

factor

the factor/exposure (e.g. "age", "smoking"). If the factor = "age", the SuperSig is computed using counts. Otherwise, rates (counts/age) are used.

wgs

logical value indicating whether sequencing data is whole-genome (wgs = TRUE) or whole-exome (wgs = FALSE)

Value

get_signature returns an object of class SuperSig

Examples

head(example_dt) # use example data from package
#> sample_id age chromosome position ref alt #> 1 1 50 chr1 94447621 G C #> 2 1 50 chr2 202005395 A C #> 3 1 50 chr7 20784978 T A #> 4 1 50 chr7 87179255 C G #> 5 1 50 chr19 1059712 G T #> 6 2 55 chr1 76226977 T C
input_dt <- make_matrix(example_dt) # convert to correct format input_dt$IndVar <- c(1, 1, 1, 0, 0) # add IndVar column get_signature(data = input_dt, factor = "Age") # get SuperSig
#> Begin feature engineering...
#> Begin cross-validated selection over 4 features and 15 inner folds...
#> ...testing inner fold 1
#> ...testing inner fold 2
#> ...testing inner fold 3
#> ...testing inner fold 4
#> ...testing inner fold 5
#> ...testing inner fold 6
#> ...testing inner fold 7
#> ...testing inner fold 8
#> ...testing inner fold 9
#> ...testing inner fold 10
#> ...testing inner fold 11
#> ...testing inner fold 12
#> ...testing inner fold 13
#> ...testing inner fold 14
#> ...testing inner fold 15
#> Signature: #> # A tibble: 1 x 1 #> X1 #> <dbl> #> 1 0.0396 #> Features: #> $X1 #> F21 F22 F23 F216 F217 F218 F219 F232 #> "A[C>G]A" "A[C>G]C" "A[C>G]G" "A[C>T]A" "A[C>T]C" "A[C>T]G" "A[C>T]T" "A[T>A]A" #> F233 F234 F235 F247 F248 F249 F250 F263 #> "A[T>A]C" "A[T>A]G" "A[T>A]T" "A[T>C]A" "A[T>C]C" "A[T>C]G" "A[T>C]T" "A[T>G]A" #> F264 F265 F266 F24 F25 F26 F27 F220 #> "A[T>G]C" "A[T>G]G" "A[T>G]T" "C[C>G]A" "C[C>G]C" "C[C>G]G" "C[C>G]T" "C[C>T]A" #> F221 F222 F223 F236 F237 F238 F251 F252 #> "C[C>T]C" "C[C>T]G" "C[C>T]T" "C[T>A]C" "C[T>A]G" "C[T>A]T" "C[T>C]A" "C[T>C]C" #> F253 F254 F267 F268 F269 F270 F28 F29 #> "C[T>C]G" "C[T>C]T" "C[T>G]A" "C[T>G]C" "C[T>G]G" "C[T>G]T" "G[C>G]A" "G[C>G]C" #> F210 F211 F224 F225 F226 F227 F239 F240 #> "G[C>G]G" "G[C>G]T" "G[C>T]A" "G[C>T]C" "G[C>T]G" "G[C>T]T" "G[T>A]A" "G[T>A]C" #> F241 F242 F255 F256 F257 F258 F271 F272 #> "G[T>A]G" "G[T>A]T" "G[T>C]A" "G[T>C]C" "G[T>C]G" "G[T>C]T" "G[T>G]A" "G[T>G]C" #> F273 F274 F212 F213 F214 F215 F228 F229 #> "G[T>G]G" "G[T>G]T" "T[C>G]A" "T[C>G]C" "T[C>G]G" "T[C>G]T" "T[C>T]A" "T[C>T]C" #> F230 F231 F243 F244 F245 F246 F259 F260 #> "T[C>T]G" "T[C>T]T" "T[T>A]A" "T[T>A]C" "T[T>A]G" "T[T>A]T" "T[T>C]A" "T[T>C]C" #> F261 F262 F275 F276 F277 F278 #> "T[T>C]G" "T[T>C]T" "T[T>G]A" "T[T>G]C" "T[T>G]G" "T[T>G]T" #> #> Model: #> $Logit #> #> Call: glm(formula = IndVar ~ ., family = binomial(), data = x) #> #> Coefficients: #> (Intercept) X1 #> -1.773 1.079 #> #> Degrees of Freedom: 4 Total (i.e. Null); 3 Residual #> Null Deviance: 6.73 #> Residual Deviance: 5.384 AIC: 9.384 #>