PROtein COVarion analysis

Maximum likelihood estimation of phylogeny under protein covarion models

The covarion hypothesis of protein evolution proposes that selective pressures on an amino acid or nucleotide site change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree (W. M. Fitch & E. Markowitz, Biochem. Genet. 4: 479-593, 1970 ). Covarion-like evolution is now recognized as an important mode of molecular evolution in proteins, structural RNA genes and protein-coding genes. Empirical studies have shown that phylogenetic estimation under a covarion model may recover different optimal topologies than when estimation is performed ignoring covarion effects. Simulation studies have demonstrated that under some edge-length conditions, use of rates-across-sites models that ignore covarion effects may cause long branch repulsion biases in the resulting phylogenetic estimates (Wang, Susko, Spencer & Roger, 2008).

PROCOV implements a number of covarion models of protein evolution (Tuffley and Steel, 1998; Galtier, 2001; Huelsenbeck, 2002; Wang et al., 2007). It evaluates the maximum likelihood of a given tree under these covarion models and optimize the tree topology using the subtree pruning and regrafting tree-searching algorithm. Covarion models may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study functional shifts in protein families that result in changes in site-rates in subtrees.

implements three statistical tests for detecting whether a protein sequence alignment has heterotachy property. The test statistics are

Simulating sequence evolution:  various versions of Seq-gen
under Profile mixture models (C20 and C60) and Site-specific frequency model (SSF);
Covarion models and other heterotachy models:

In developing profile mixture models and PMSF models as well as the covarion and heterotachy models, we frequently need to simulate sequences under these models to evaluate their performance. The sequence simulation programs we often use are listed below.
                                                                                                                                                                                                            Last updated: 10/24/2016