Help - Phylogenetic Analysis

PhylomeDB User's Manual Index


Phylogenetic Analysis

Orthology prediction

PhylomeDB uses a phylogeny-based algorithm to detect duplication and speciation events on the trees. This algorithm is described in more detail in Huerta-Cepas et. al (2007).

In contrast to standard phylogeny-based methods that use reconciliation of the gene tree with a given species tree to infer duplication events, our approach does not require any previous fully resolved species topology, as far as the trees are rooted. The orthology prediction algorithm is run independently for each seed protein using the corresponding phylogenetic tree.

To establish orthology and paralogy relationships, the algorithm goes over all the nodes in the tree deciding whether they are a duplication and a speciation node. For each node two tree partitions are defined that contain the sequences connected to each of the two children nodes. Then, a species-overlap score is defined between the two partitions as follows: species common to both partitions/species in any of the partitions. Finally, if the score is higher than a given threshold the node is mapped as a duplication event, otherwise it is considered a speciation event.

So far in all phylomes the species-overlap threshold was set to 0.0 - that is, no common species between the two partitions were allowed - because this produced the best results in the benchmark, as explained in the Human phylome paper. Once all the nodes in the tree are marked as a duplication or speciation event, the algorithm establishes orthology relationships between the seed protein and other proteins in the tree. For each protein, the algorithm tracks the nodes that connect it to the seed protein and establishes an orthology relationship only if this connection proceeds exclusively through speciation nodes, disregarding intra-specific duplications. After mapping speciation and duplication nodes onto the phylogeny, several situations may arise in which orthology relationships are not one-to-one relationships, but rather one-to-many or many-to-many.


All the duplications and speciation events are mapped onto each phylogenetic tree in the form of different coloring. Blue nodes stand for speciation nodes while red nodes represent duplication nodes.

metaPhOrs orthology and paralogy predictions:

PhylomeDB trees offer a link to the metaPhOrs orthology and paralogy predictions for the seed species. MetaPhOrs is a public repository of phylogeny-based orthology and paralogy predictions computed using resources available in seven popular homology prediction services (PhylomeDB, EnsemblCompara, EggNOG, OrthoMCL, COG, Fungal Orthogroups, and TreeFam). For each prediction a Consistency Score and Evidence Level describing its goodness, together with number of trees and links to their source databases are offered. This information can be accessed by left-clicking on the metaPhOrs link situated at the bottom of the page. A table will be loaded with a list of orthologs / paralogs sorted by species.

a) Click here to upload the orthology / paralogy tables provided by metaPhOrs
b) Table name, the paralogs table can usually be found below the orthologs table. It also indicates the code of the seed species considered.
c) Information about orthologs / paralogs divided by species. For more information check the metaPhOrs website.