Candida subhashi phylome
Phylogenomic pipeline:
Database searches: For each protein a Smith-Waterman search was performed against the proteome database to retrieve a set of proteins with a significant similarity (e-value < 10-3). Only sequences that aligned with a continuous region longer than 50% of the query sequence were selected. At most 150 sequences were taken.
Multiple sequence alignment: Sets of homologous protein sequences were aligned using MUSCLE 3.6. Positions in the alignment with gaps in more than 25% of the sequences were eliminated using trimAl before phylogenetic analysis, unless this procedure removed more than one-third of the positions in the alignment. In such cases the percentage of sequences with gaps allowed was automatically increased until at least two-thirds of the initial positions were conserved.
Phylogenetic reconstructions: Neighbor-joining trees were derived using scoredist distances as implemented in BioNJ. Maximum likelihood trees were derived from the alignments using PhyML_aLRT. For each protein family a Maximum likelihood tree was reconstructed using the JTT evolutionary model. A discrete gamma-distribution model with four rate categories plus invariant positions was used, the gamma parameter and the fraction of invariant positions were estimated from the data. The evolutionary model best fitting the data was determined by comparing the likelihood of the used models according to the AIC criterion.