Generate a tissue-specific SuperSig for a given dataset of mutations and exposure factor. Returns the SuperSig and a classification model trained with the SuperSig.
get_signature(data, factor, wgs = FALSE)
data | a data frame of mutations containing columns for
|
---|---|
factor | the factor/exposure (e.g. "age", "smoking"). If the factor = "age", the SuperSig is computed using counts. Otherwise, rates (counts/age) are used. |
wgs | logical value indicating whether sequencing data is
whole-genome (wgs = |
get_signature
returns an object of class SuperSig
#> sample_id age chromosome position ref alt #> 1 1 50 chr1 94447621 G C #> 2 1 50 chr2 202005395 A C #> 3 1 50 chr7 20784978 T A #> 4 1 50 chr7 87179255 C G #> 5 1 50 chr19 1059712 G T #> 6 2 55 chr1 76226977 T Cinput_dt <- make_matrix(example_dt) # convert to correct format input_dt$IndVar <- c(1, 1, 1, 0, 0) # add IndVar column get_signature(data = input_dt, factor = "Age") # get SuperSig#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> Signature: #> # A tibble: 1 x 1 #> X1 #> <dbl> #> 1 0.0396 #> Features: #> $X1 #> F21 F22 F23 F216 F217 F218 F219 F232 #> "A[C>G]A" "A[C>G]C" "A[C>G]G" "A[C>T]A" "A[C>T]C" "A[C>T]G" "A[C>T]T" "A[T>A]A" #> F233 F234 F235 F247 F248 F249 F250 F263 #> "A[T>A]C" "A[T>A]G" "A[T>A]T" "A[T>C]A" "A[T>C]C" "A[T>C]G" "A[T>C]T" "A[T>G]A" #> F264 F265 F266 F24 F25 F26 F27 F220 #> "A[T>G]C" "A[T>G]G" "A[T>G]T" "C[C>G]A" "C[C>G]C" "C[C>G]G" "C[C>G]T" "C[C>T]A" #> F221 F222 F223 F236 F237 F238 F251 F252 #> "C[C>T]C" "C[C>T]G" "C[C>T]T" "C[T>A]C" "C[T>A]G" "C[T>A]T" "C[T>C]A" "C[T>C]C" #> F253 F254 F267 F268 F269 F270 F28 F29 #> "C[T>C]G" "C[T>C]T" "C[T>G]A" "C[T>G]C" "C[T>G]G" "C[T>G]T" "G[C>G]A" "G[C>G]C" #> F210 F211 F224 F225 F226 F227 F239 F240 #> "G[C>G]G" "G[C>G]T" "G[C>T]A" "G[C>T]C" "G[C>T]G" "G[C>T]T" "G[T>A]A" "G[T>A]C" #> F241 F242 F255 F256 F257 F258 F271 F272 #> "G[T>A]G" "G[T>A]T" "G[T>C]A" "G[T>C]C" "G[T>C]G" "G[T>C]T" "G[T>G]A" "G[T>G]C" #> F273 F274 F212 F213 F214 F215 F228 F229 #> "G[T>G]G" "G[T>G]T" "T[C>G]A" "T[C>G]C" "T[C>G]G" "T[C>G]T" "T[C>T]A" "T[C>T]C" #> F230 F231 F243 F244 F245 F246 F259 F260 #> "T[C>T]G" "T[C>T]T" "T[T>A]A" "T[T>A]C" "T[T>A]G" "T[T>A]T" "T[T>C]A" "T[T>C]C" #> F261 F262 F275 F276 F277 F278 #> "T[T>C]G" "T[T>C]T" "T[T>G]A" "T[T>G]C" "T[T>G]G" "T[T>G]T" #> #> Model: #> $Logit #> #> Call: glm(formula = IndVar ~ ., family = binomial(), data = x) #> #> Coefficients: #> (Intercept) X1 #> -1.773 1.079 #> #> Degrees of Freedom: 4 Total (i.e. Null); 3 Residual #> Null Deviance: 6.73 #> Residual Deviance: 5.384 AIC: 9.384 #>