D. erecta phylome (1)
Phylogenomic pipeline:
Database searches: For each protein a Smith-Waterman search was performed against the proteome database to retrieve a set of proteins with a significant similarity (e-value < 1e-05). Only sequences that aligned with a continuous region longer than 50% of the query sequence were selected. At most 150 sequences were taken.
Multiple sequence alignment: Sets of homologous protein sequences were aligned using MUSCLE v3.7. The resulting alignment was trimmed using trimAl v1.3 using a gap score cutoff of 0.1 and guaranteeing that at least 33.3% of the original alignment is kept even when the threshold is overpassed.
Phylogenetic reconstructions: Phylogenetic trees were reconstructed using a Neighbour Joining approach as implemented in BioNJ. The likelihood of this topology was computed, allowing branch-length optimisation, using four different models (JTT, WAG, VT, and Blosum62), as implemented in PhyML 3.0. The two evolutionary models best fitting the data were determined by comparing the likelihood of the used models according to the AIC criterion. Maximum likelihood trees were derived using the two selected models. In all cases a discrete gamma-distribution model with four rate categories plus invariant positions was used, the gamma parameter and the fraction of invariant positions were estimated from the data.
Seed species: Drosophila erecta