Data Compression Algorithms and Evolutionary Pressure in Single Nucleotide Polymorphisms (244)
In-silico classification of novel SNPs as deleterious or neutral employs various computational methods; one such approach being quantification of phylogenetic diversity among homologous genes1 2 . Early work focused on differences in biochemical properties between two amino acids (AAs)3, later extended to cover larger sets of AAs1, known as the Grantham Variance (GV).
When considering a non-synonymous SNP, the set of AAs included is those comprising the same position in a multi-species sequence alignment (MSA). Greater diversity (inferred from greater biochemical differences) implies less evolutionary control and hence a reduced likelihood of deleterious outcome.
A problem identified4 with this method is the underestimation of evolutionary pressures when distantly related species are included in the alignment; phylogenetic distance is, with this method of measurement, quantitatively equivalent to reduced selection pressure.
Our research demonstrated a novel method of controlling for this phenomenon by utilising commonplace data compression algorithms. A classification algorithm from our previous work improved as follows by incorporation of this approach:
ROC area under the curve: 0.749 to 0.831
Sensitivity: 56.32% to 63.78%
Specificity: 84.64% to 85.83%
- Tavtigian, Sean V., Amie M. Deffenbaugh, Luo Yin, Thaddeus Judkins, Thomas Scholl, Paul B. Samollow, Deepika de Silva, Andrey Zharkikh, and Alun Thomas. "Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral." Journal of medical genetics 43, no. 4 (2006): 295-305.
- Kumar, Prateek, Steven Henikoff, and Pauline C. Ng. "Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm." Nature protocols 4, no. 7 (2009): 1073-1081.
- Grantham, R. "Amino acid difference formula to help explain protein evolution." Science 185, no. 4154 (1974): 862-864.
- Hicks, Stephanie, David A. Wheeler, Sharon E. Plon, and Marek Kimmel. "Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed." Human mutation 32, no. 6 (2011): 661-668.