In order to construct phylogenetic trees, we make comparisons between sets of genes belonging to different taxa. We are working under the assumption that the taxa we are studying are related by evolution. Therefore, we need to ensure that the genes we use to construct our trees are orthologous - that is, they have evolved through speciation events, from a common ancestor.
OMA Groups are groups of sequences that are all orthologous to one another, and can be found in the OMA Browser. See the OMA Group module for more details.
2. Which major clades are these groups from?
3. What is the description / function of each of the groups?
4. How many sequences are there in each orthologous group?
In order to construct phylogenetic trees, we must first align the sequences. This allows us to compare sequences site by site. There are a multitude of Multiple Sequence Alignment (MSA) tools available, many of which can be found on the EBI website here.
1. Use an online sequence aligner to align the sequences. Which output format should you choose?
2. Construct a phylogenetic tree using your aligned sequences. Tools can be found on the Vital-IT website (RAxML BlackBox - this could take between 10 minutes to over an hour, depending on the Group and model!) or on the EBI website Simple Phylogeny [ClustalW2] - try building two trees by using both UPGMA and NJ clustering methods in the clustering options).
Alternatively, you can try the alignment and tree inference tools available online at www.phylogeny.fr (in particular PhyML) or an IQTree webserver here, as an alternative to RAxML if the queue is too long at Vital-IT.
3. Why are RAxML and other likelihood methods much slower than the clustering methods?
4. Which group do you think will take the longest to compute a tree for, using RAxML? Which do you think will be the quickest?
Now that we have our trees, we would like to visualise and compare them. Unfortunately, the output format (Newick) isn't particularly conducive to interpreting trees. Thankfully, there are online viewers such as phylo.io to help us.
1. View the trees that you have built using an online tree visualisation tool.
2. Reroot the trees, and swap branches, to make comparisons easier.
3. Which species, in group 197280, shares a most recent common ancestor with Trametes versicolor?
4. Which species is most closely related to Mucor circinelloides in group 807295?
5. Can you see any difference between trees estimated using different methods? For instance, which species are grouped differently when comparing the trees found by the UPGMA and NJ methods in group 197280?