Frequently Asked Questions

 How do I cite PhylomeDB?

Please cite PhylomeDB each time you have used it for your published research. Either if you downloaded a whole dataset or whether it was the source for obtaining some relevant evolutionary information, or the sequences to prime some phylogenetic analysis. Cite the most recent publication of the database, which is now v4:

PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Huerta-Cepas J, Capella-Gutiérrez S, Pryszcz LP, Marcet-Houben M, Gabaldón T. Nucleic Acids Res. 2014 Jan;42(Database issue):D897-902. doi: 10.1093/nar/gkt1177.

If you want to refer to the phylogenetic pipeline used in PhylomeDB, which was described in v3 paper, please cite:

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, Marcet-Houben M, Gabaldón T. Nucleic Acids Res. 2011 Jan;39(Database issue):D556-60.

I have a phyID, what can I do with it?

phyID's are numbers associated to phylomes. They are usually provided in the phylome publication and are useful to quickly pick out the phylome of interest from the complete list of phylomes. phyIDs can be found in the All phylomes tab, as the number located next to the phylome name. They can also be used to locate the download data within the ftp repository where phylomes are identified only by their phyID (Downloads --> phylomes --> phylome_+phyID). 

Alignments are not displayed properly in my browser

PhylomeDB uses Jalview, an external application that requires Java. Please, check that you have the latest version of Java installed and that Java is enabled in your browser. When correctly installed your browser should display the following examples properly.

You can download the latest version of Java at Java Download Site.

What type of branch supports are used in the phylome?

The specific methods used for each phylome are explained in the corresponding “phylome information page”. Due to computational and time constrains, for most phylomes we did not used standard bootstrap analyses. Instead we usually compute approximate Likelihood Ratio Tests (aLRT), as implemented in PhyML-aLRT or in PhyML v3.0 version.

When I press "show branch support" icon, not all branches display the support

When you press the “branch support” icon, red numbers with the support values will appear beneath the branches. If some do not appear it is because there is no space to show the number, in that case you can press “force topology” and all branch lengths will be re-sized to display all supports.

What do the different branch colors indicate?

Red indicates branches in which our species-overlap algorithm detects a duplication event, whereas speciation events are marked in blue. This color-coding system coincides to that used in EnsemblCompara trees.

What are "collateral trees", what are they useful for?

Collateral trees of a given protein are those trees in which that protein is present but it was not used as a seed for the tree reconstruction. In other words, they are trees for which a paralog of the given protein was used as a seed. Collateral trees may provide additional information on the topological position of a given protein. If several collateral trees support a specific relationship (e.g an orthology relationship or a duplication), we can regard this as additional evidence for that relationship. Collateral trees also provide information about proteins that belong to an organism that has not been used as seed in any phylome but is present in them.

I cannot find a tree for a given protein, even if the protein is present in the proteome you used to build the phylome

For sequences that did not produce at least three significant hits in the genomes considered (see the specific parameters and cut-offs in the phylome information page) we cannot reconstruct a tree. This is probably the reason why you cannot find a tree for these sequences.

I want to add some additional sequences (from a species not included in the phylome) and re-do the tree using the same parameters. How should I proceed?

Click on the "download data.tar.gz" button on the top of the tree visualization pannel. This will download a compressed folder with several files in it. There you will find a multi-fasta file with all the sequences (file ended by .msf.fasta) and the raw alignment file (file ended by alg.raw.AA.fasta). Then, you can align your sequence to the alignment. Muscle or Mafft have options to directly align sequences to an pre-existing alignment. Alternatively you can just add your sequences to the multifasta file, re-align with MUSCLE, trim with trimAl and build the tree with PhyML, with the parameters indicated in the phylome information page. All these programs can be downloaded or are available through the Phylemon webserver.

I want to create a direct link from a webapge to phylomeDB resources/results. How should I proceed?

You can find the information at the User's manual section: How to link to phylomeDB. You have a lot of possibilities to link the resources such as the information about a specific phylome or the results for a given query such as phylogenetic tree for a protein in a specific phylome, etc


I want to use the PhylomeDB tree image for a publication but the one in the website does not have enough resolution. How should I proceed?

You can download a high resolution version of the image in .svg format. This file is provided together with other data when downloading the data associated to this tree. You can do this by clicking on the "download tar.gz" button of your favorite tree. You should be able to open that image with several image editors and modified at wish.

I find several proteins with identical IDs in the same tree. What are these?

These sequences correspond to different genes in the same genome  that encode identical proteins. As Uniprot does, we assign the same protein identifier for to all proteins within a single species that are identical in sequence. These sequences are usually very recent duplicates (perhaps CNVs) that did not diverged yet at the protein level, or genes that are undergoing gene conversion.