FAQ | PhylomeDB

How do I cite PhylomeDB?

Please cite PhylomeDB each time you have used it for your published research. Either if you downloaded a whole dataset or whether it was the source for obtaining some relevant evolutionary information, or the sequences to prime some phylogenetic analysis. Cite the most recent publication of the database, which is now v5:

PhylomeDB V5: an expanding repository for genome-wide catalogues of annotated gene phylogenies.Diego Fuentes, Manuel Molina, Uciel Chorostecki, Salvador Capella-Gutiérrez, Marina Marcet-Houben, Toni Gabaldón Nucleic Acids Research. Volume 50, Issue D1, 7 January 2022:D1062–D1068. doi: 10.1093/nar/gkab966.

If you want to refer to the phylogenetic pipeline used in PhylomeDB, which was described in v3 paper, please cite:

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, Marcet-Houben M, Gabaldón T. Nucleic Acids Res. 2011 Jan;39(Database issue):D556-60.

I have a phyID, what can I do with it?

phyID's are numbers associated to phylomes. They are usually provided in the phylome publication and are useful to quickly pick out the phylome of interest from the complete list of phylomes. phyIDs can be found in the All phylomes tab, as the number located next to the phylome name. They can also be used to locate the download data within the ftp repository where phylomes are identified only by their phyID (Downloads --> phylomes --> phylome_+phyID).

Alignments are not displayed properly in my browser

PhylomeDB uses Jalview, an external application that requires Java. Please, check that you have the latest version of Java installed and that Java is enabled in your browser. When correctly installed your browser should display the following examples properly.

You can download the latest version of Java at Java Download Site.

What type of branch supports are used in the phylome?

The specific methods used for each phylome are explained in the corresponding “phylome information page”. Due to computational and time constrains, for most phylomes we did not used standard bootstrap analyses. Instead we usually compute approximate Likelihood Ratio Tests (aLRT), as implemented in PhyML-aLRT or in PhyML v3.0 version.

When I press "show branch support" icon, not all branches display the support

When you press the “branch support” icon, red numbers with the support values will appear beneath the branches. If some do not appear it is because there is no space to show the number, in that case you can press “force topology” and all branch lengths will be re-sized to display all supports.

What do the different branch colors indicate?

Red indicates branches in which our species-overlap algorithm detects a duplication event, whereas speciation events are marked in blue. This color-coding system coincides to that used in EnsemblCompara trees.

What are "collateral trees", what are they useful for?

Collateral trees of a given protein are those trees in which that protein is present but it was not used as a seed for the tree reconstruction. In other words, they are trees for which a paralog of the given protein was used as a seed. Collateral trees may provide additional information on the topological position of a given protein. If several collateral trees support a specific relationship (e.g an orthology relationship or a duplication), we can regard this as additional evidence for that relationship. Collateral trees also provide information about proteins that belong to an organism that has not been used as seed in any phylome but is present in them.

I cannot find a tree for a given protein, even if the protein is present in the proteome you used to build the phylome

For sequences that did not produce at least three significant hits in the genomes considered (see the specific parameters and cut-offs in the phylome information page) we cannot reconstruct a tree. This is probably the reason why you cannot find a tree for these sequences.

I want to add some additional sequences (from a species not included in the phylome) and re-do the tree using the same parameters. How should I proceed?

Click on the "download data.tar.gz" button on the top of the tree visualization pannel. This will download a compressed folder with several files in it. There you will find a multi-fasta file with all the sequences (file ended by .msf.fasta) and the raw alignment file (file ended by alg.raw.AA.fasta). Then, you can align your sequence to the alignment. Muscle or Mafft have options to directly align sequences to an pre-existing alignment. Alternatively you can just add your sequences to the multifasta file, re-align with MUSCLE, trim with trimAl and build the tree with PhyML, with the parameters indicated in the phylome information page. All these programs can be downloaded or are available through the Phylemon webserver.

I want to create a direct link from a webapge to phylomeDB resources/results. How should I proceed?

You can find the information at the User's manual section: How to link to phylomeDB. You have a lot of possibilities to link the resources such as the information about a specific phylome or the results for a given query such as phylogenetic tree for a protein in a specific phylome, etc

I want to use the PhylomeDB tree image for a publication but the one in the website does not have enough resolution. How should I proceed?

You can download a high resolution version of the image in .svg format. This file is provided together with other data when downloading the data associated to this tree. You can do this by clicking on the "download tar.gz" button of your favorite tree. You should be able to open that image with several image editors and modified at wish.

I find several proteins with identical IDs in the same tree. What are these?

These sequences correspond to different genes in the same genome that encode identical proteins. As Uniprot does, we assign the same protein identifier for to all proteins within a single species that are identical in sequence. These sequences are usually very recent duplicates (perhaps CNVs) that did not diverged yet at the protein level, or genes that are undergoing gene conversion.

Phylo Explorer does not responde to any input. What happened?

Phylo Explorer is mainly built in the Javascript programming language, which is dependent on the browser's version.

Updating your browser to the latest release should solve the problem. If that is not the case, contact with our maintenance.

Phylo Explorer has lag issues, when the mouse cursor moves on the heatmap.

Phylo Explorer to serve useful information, must import large amount of data from the database and manipulate them locally, so the performance depends on the user's system. Try adding more species to the combination in the principal page of Phylo Explorer, or using the filter search bar above the heatmap to narrow down the displayed result.