Oly methylation analysis, Jan 31st, 2020

Revisiting my task list:

New methylation distance matrix to include just loci with new 75% threshold <- not needed

I reviewed Steven’s MethylKit and he had already took necessary steps to select only loci with high coverage across samples.

  • Upon calling methylation status for each locus (using sorted BAM files, function processBismarkAln) he set the “minimum read coverage to call a methylation status for a base” to 2.
  • Then, he used the filterByCoverage function to filter out bases with coverage <10x or >100x. This was done on each locus within each sample, and did not discard loci that had <10x across all samples.
  • However, he then used the unite function to “merge all samples to one object for base-pair locations that are covered in all samples … The [resulting] methylBase object contains methylation information for regions/bases that are covered in all samples.” This step in particular should control for genetic differences that could influence methylation status.


  • With pre-filtered count files (5x and 10x across 75% of samples). NOTE: to generate my count files I had used the same filter settings as Steven (low.count=10, hi.count=100.
  • Reannontated new MACAU results

Reannonate DMLs <- not needed

Wasn’t necessary, since the DMLs were already filtered.

DMG analysis <- DONE

  • Figure out something comparable to Fst on DMLs and MACAU to do a correlation analysis
  • GO_MWU analysis redo with genes: https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/code/GO_MWU.ipynb

Get manhattan distance for just DMLs <- DONE

  • This is saved as a .csv file on github here: paper-oly-mbdbs-gen/analyses/dist.manhat.DMLs.csv
  • Note: 51 loci were identified as differentially methylated (DMLs were determined to be those with 25% differences, qvalue < 0.01)
  • Out of curiosity, I did a PCA on the DML percent methylation distance matrix with the package FactoMineR::PCA()

To do still

GO_MWU enrichment analysis redo with genes:

Follow Katherine’s analysis - https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/code/GO_MWU.ipynb. I may try to do this in RMarkdown, though.

Figure out something comparable to Fst on DMLs and MACAU to do a correlation analysis

Calculate Pst values; description of Pst from the Carja et al. 2017 paper -

“The P st values P st is a measure of the proportion of variance explained by between-population divergence. It is the phenotypic analog of the population genetics parameter F st 27,29. For a single probe, P st was calculated as σ 2 b /(σ 2 b + 2σ 2 w ), where σ 2 b is the between population variance and σ 2 w is the average within population variance. P st values range from 0 to 1, with values near 1 signifying that the majority of epigenetic variance for a probe is between populations rather than within populations.”

Get methods down on paper

Re-do MACAU using a genotype value as the predictor variable! (instead of length)

I was re-reading the MACAU paper Lea et al. 2015 and noticed that one can, in fact, use a genotype value as the predictor variable …


Written on January 31, 2020