Transform a VCF object into a data frame of trinucleotide mutations with flanking bases in a wide matrix format. The function assumes that the VCF object contains only one sample and that each row in rowRanges represents an observed mutation in the sample.
process_vcf(vcf)
vcf | a VCF object (from |
---|
process_vcf
returns a data frame of mutations,
one row per mutation
# Use example vcf from VariantAnnotation suppressPackageStartupMessages({library(VariantAnnotation)}) fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") vcf <- VariantAnnotation::readVcf(fl, "hg19") # Subset to first sample vcf <- vcf[, 1] # Subset to row positions with homozygous or heterozygous alt positions <- geno(vcf)$GT != "0|0" vcf <- vcf[positions[, 1],] colData(vcf)$age <- 50 # Add patient age to colData (optional) # Run function dt <- process_vcf(vcf) head(dt)#> sample_id age chromosome position ref alt #> 1 HG00096 50 chr22 50326116 C T #> 2 HG00096 50 chr22 50336761 G A #> 3 HG00096 50 chr22 50346072 C T #> 4 HG00096 50 chr22 50350418 T C #> 5 HG00096 50 chr22 50351413 C T #> 6 HG00096 50 chr22 50351977 G A