Q-matrix mixture RAxML

Widely used amino acid substitution models, such as Dayhoff, JTT or WAG, are based on empirical amino acid interchange matrices estimated from databases of protein alignments. Variation in the evolutionary process between sites is typically modeled using a rates-across-sites distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices and not modeled by most of current phylogenetic software. This has been shown to cause serious long-branch-attraction bias in maximum likelihood estimation of protein phylogeny by conventional software. A principal components analysis of amino acid frequency vectors of 6555 sites from 21 protein data sets revealed four major classes of sequence sites. These classes together with the frequency vector of a whole data set are used in a class-frequency (cF) mixture model to model site-specific distributions for phylogenetic inference, which has been shown to improve the model fit for the data and ameliorate the long branch attraction problems.

The cF mixture model is implemented in QmmRAxML, which is based on RAxML-VI-HPC (version 2.2). For the moment, only protein sequences can be used for QmmRAxML.

July 12 2013: QmmRAxML 2.0 released.

New features in version 2.0:

Source codes


Huai-Chun Wang, Edward Susko and Andrew J. Roger (2014): An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation. Molecular Biology and Evolution 31: 779-792

Huai-Chun Wang, Karen Li, Edward Susko and Andrew J. Roger (2008): A class frequency mixture model that adjusts for site specific amino acid frequencies and imporves inference of protein phylogeny. BMC Evolutionary Biology 8: 331
Featured in BMC Evolutionary Biology in December 2008:

Amino acid frequency models flawed

BMC Evolutionary Biology 2008, 8:331

"Some amino acid substitutions are found less frequently in real data than predicted by empirical models such as JTT, which may lead to biases in phylogenies, but a new mixture model can more accurately model substitutions and improve phylogenetic inferences."