Take a signature representation from SuperSig and group trinucleotides within each feature into interpretable labels, with optional IUPAC labeling from IUPAC_CODE_MAP in the Biostrings package

simplify_signature(object, iupac)

Arguments

object

an object of class SuperSig

iupac

logical value indicating whether to use IUPAC labels (iupac = TRUE) or not (iupac = FALSE)

Value

simplify_signature returns a vector of simplified features and their difference in mean mean rates between exposed and unexposed (or average rate if the factor is "age")

Examples

head(example_dt) # use example data from package
#> sample_id age chromosome position ref alt #> 1 1 50 chr1 94447621 G C #> 2 1 50 chr2 202005395 A C #> 3 1 50 chr7 20784978 T A #> 4 1 50 chr7 87179255 C G #> 5 1 50 chr19 1059712 G T #> 6 2 55 chr1 76226977 T C
input_dt <- make_matrix(example_dt) # convert to correct format input_dt$IndVar <- c(1, 1, 1, 0, 0) # add IndVar column supersig <- get_signature(data = input_dt, factor = "Smoking")
#> Begin feature engineering...
#> Begin cross-validated selection over 4 features and 15 inner folds...
#> ...testing inner fold 1
#> ...testing inner fold 2
#> ...testing inner fold 3
#> ...testing inner fold 4
#> ...testing inner fold 5
#> ...testing inner fold 6
#> ...testing inner fold 7
#> ...testing inner fold 8
#> ...testing inner fold 9
#> ...testing inner fold 10
#> ...testing inner fold 11
#> ...testing inner fold 12
#> ...testing inner fold 13
#> ...testing inner fold 14
#> ...testing inner fold 15
simplify_signature(object = supersig, iupac = FALSE)
#> C>A C>T T>C [C>G](ACT) [T>A](CTG) #> -1.286204e-04 -1.286204e-04 -1.256126e-04 -1.148120e-04 -1.058889e-04 #> [T>G](ACG) (ACT)[C>G]G (ATG)[T>A]A (CTG)[T>G]T #> -9.105143e-05 -9.967173e-06 -1.551465e-05 -2.743243e-05
simplify_signature(object = supersig, iupac = TRUE)
#> C>A C>T T>C [C>G]H [T>A]B #> -1.286204e-04 -1.286204e-04 -1.256126e-04 -1.148120e-04 -1.058889e-04 #> [T>G]V H[C>G]G D[T>A]A B[T>G]T #> -9.105143e-05 -9.967173e-06 -1.551465e-05 -2.743243e-05