Episyrphus balteatus phylome (transcriptome version)
Phylogenomic pipeline:
Database searches: For each protein a Smith-Waterman search was performed against the proteome database to retrieve a set of proteins with a significant similarity (e-value < 1e-05). Only sequences that aligned with a continuous region longer than 30% of the query sequence were selected. At most 200 sequences were taken. An artificial database size of 1.000.000 sequences was put to make comparable results with other phylomes in the database.
Multiple sequence alignment: Sets of homologous protein sequences were aligned using three different programs: MUSCLE v3.7, MAFFT v6.712b and DIALIGN-TX. Alignments were performed in forward and reverse direction and the six resulting alignments were combined using M-COFFEE. The resulting alignment was trimmed using trimAl v1.3 using a consistency cutoff of 0.1667 and a gap score cutoff of 0.1.
Phylogenetic reconstructions: Phylogenetic trees were reconstructed using a Neighbour Joining approach as implemented in BioNJ. The likelihood of this topology was computed, allowing branch-length optimisation, using six different models (JTT, WAG, MtREV, LG, Blosum62 and DCMut), as implemented in PhyML 3.0. The best evolutionary model fitting the data was determined by comparing the likelihood of the used models according to the AIC criterion. Maximum likelihood trees were derived for each protein in the seed species using the selected model. In all cases a discrete gamma-distribution model with four rate categories plus invariant positions was used, the gamma parameter and the fraction of invariant positions were estimated from the data.
Seed species: Episyrphus balteatus