Transform a data frame of mutations in long format into a data frame of trinucleotide mutations with flanking bases in a wide matrix format.

make_matrix(data, genome = "hg19")

Arguments

data

a data frame of mutations in VCF format (see vignette for details)

genome

the reference genome used ("hg19" or "hg38")

Value

make_matrix returns a data frame of mutations, one row per sample

Examples

head(example_dt) # use example data from package
#> sample_id age chromosome position ref alt #> 1 1 50 chr1 94447621 G C #> 2 1 50 chr2 202005395 A C #> 3 1 50 chr7 20784978 T A #> 4 1 50 chr7 87179255 C G #> 5 1 50 chr19 1059712 G T #> 6 2 55 chr1 76226977 T C
input_dt <- make_matrix(example_dt) # convert to correct format head(input_dt)
#> # A tibble: 5 x 98 #> sample_id age `A[T>G]T` `C[T>A]A` `G[C>A]A` `G[C>G]G` `G[C>G]T` `A[C>G]T` #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 50 1 1 1 1 1 0 #> 2 2 55 0 0 0 0 0 1 #> 3 3 72 1 1 0 0 0 0 #> 4 4 53 0 0 0 0 0 1 #> 5 5 48 0 1 1 0 0 0 #> # … with 90 more variables: C[C>A]T <dbl>, C[T>C]G <dbl>, T[C>A]C <dbl>, #> # T[C>A]T <dbl>, A[T>C]C <dbl>, C[T>C]C <dbl>, T[T>A]C <dbl>, C[C>G]G <dbl>, #> # G[C>T]A <dbl>, A[C>A]T <dbl>, C[C>A]C <dbl>, G[T>G]T <dbl>, C[C>T]C <dbl>, #> # T[C>T]C <dbl>, A[C>T]C <dbl>, G[C>T]C <dbl>, C[C>T]T <dbl>, T[C>T]T <dbl>, #> # A[C>T]T <dbl>, G[C>T]T <dbl>, C[C>T]A <dbl>, T[C>T]A <dbl>, A[C>T]A <dbl>, #> # C[C>T]G <dbl>, T[C>T]G <dbl>, A[C>T]G <dbl>, G[C>T]G <dbl>, A[C>A]C <dbl>, #> # G[C>A]C <dbl>, G[C>A]T <dbl>, C[C>A]A <dbl>, T[C>A]A <dbl>, A[C>A]A <dbl>, #> # C[C>A]G <dbl>, T[C>A]G <dbl>, A[C>A]G <dbl>, G[C>A]G <dbl>, C[C>G]C <dbl>, #> # T[C>G]C <dbl>, A[C>G]C <dbl>, G[C>G]C <dbl>, C[C>G]T <dbl>, T[C>G]T <dbl>, #> # C[C>G]A <dbl>, T[C>G]A <dbl>, A[C>G]A <dbl>, G[C>G]A <dbl>, T[C>G]G <dbl>, #> # A[C>G]G <dbl>, T[T>C]C <dbl>, G[T>C]C <dbl>, C[T>C]T <dbl>, T[T>C]T <dbl>, #> # A[T>C]T <dbl>, G[T>C]T <dbl>, C[T>C]A <dbl>, T[T>C]A <dbl>, A[T>C]A <dbl>, #> # G[T>C]A <dbl>, T[T>C]G <dbl>, A[T>C]G <dbl>, G[T>C]G <dbl>, C[T>A]C <dbl>, #> # A[T>A]C <dbl>, G[T>A]C <dbl>, C[T>A]T <dbl>, T[T>A]T <dbl>, A[T>A]T <dbl>, #> # G[T>A]T <dbl>, T[T>A]A <dbl>, A[T>A]A <dbl>, G[T>A]A <dbl>, C[T>A]G <dbl>, #> # T[T>A]G <dbl>, A[T>A]G <dbl>, G[T>A]G <dbl>, C[T>G]C <dbl>, T[T>G]C <dbl>, #> # A[T>G]C <dbl>, G[T>G]C <dbl>, C[T>G]T <dbl>, T[T>G]T <dbl>, C[T>G]A <dbl>, #> # T[T>G]A <dbl>, A[T>G]A <dbl>, G[T>G]A <dbl>, C[T>G]G <dbl>, T[T>G]G <dbl>, #> # A[T>G]G <dbl>, G[T>G]G <dbl>