Module 1: Exploring Orthology with the OMA Browser

Back to home / Reset

Browsing the Gene Page

Gene-centric pages in OMA give all the information specific to a single gene in OMA. The gene is found at the top, with its OMA ID and UniProt ID. Different sub-pages are available on the left-hand scrollable menu with specific information, including the orthologs, paralogs, gene information, isoforms, GO annotations, sequences, and local synteny.

You ran a network analysis and found that the human gene with UniProt ID OR2L5_HUMAN is involved in an interesting pathway. Search for this gene on the OMA homepage.

  • 1. Based on the gene information tab, what is this gene?

    Olfactory receptor family 2 subfamily L member 5. Apparently one of many olfactory receptors in humans. More information can be accessed by following the link to Ensembl or UniProt.

  • 2. Where is this gene encoded in the genome?

    The gene is located on chromosome 1, starting at position 248021948, and ending at 248022886.

  • 3. Based on the Gene Ontology (GO) annotations, what molecular function and biological process is this protein probably involved in? How sure is this annotation?

    The protein is probably an olfactory receptor involved in smell detection. The GO code is IEA (Inferred from electronic information). One should be careful with these annotations since they were not confirmed experimentally.

  • 4. Does this gene share any localised conserved synteny among any other species? If so, which ones?

    Not strictly conserved but we can find few orthologs of neighbouring genes in the proximity of its orthologs. A striking point is the numerous duplications in other species.

  • 5. Go to the orthologs table. How many orthologs overall are inferred by OMA?

    145 orthologs.

  • 6. How many 1:1 pairwise orthologs are there?

    You can sort orthologs by their relation type. (Click on the arrow on the column's header)

    There are only 5 1:1 relations, in primates. There was probably a duplication leading to this clade (explaining the m:1 (many-to-one)) and duplication in other clades as we saw with the synteny.

  • 7. How many orthologs are inferred which are supported by HOG, pairwise, and OMA Group evidence?

    18 orthologs are supported by three sources.

  • 8. How conserved is the domain architecture of these orthologs? What is this domain?

    There is only one domain in most species, with identical annotation. The domain architecture is very conserved. The domain is a 1 Rhopdopsin 7-helix transmembrane protein domain, a transmembrane domain common in olfactory receptors (cf. CATH).

  • 9. How many paralogs are there in Human? When did they duplicate?

    3 paralogs. They appear to have duplicated in Simiiformes or later.

OMA Groups

OMA Groups are cliques of orthologs based on the orthology graph. In an OMA Group, all the genes are connected to each other by pairwise orthologous relations. For this reason, OMA Groups are typically conservative (they may exclude true orthologs) but have high confidence.

Open the OMA Groups page of the gene from before. If you are starting the module from here search for the OMA Group 297702.

  • 1. How many members are there in the group? Why is it different from the number of pairwise 1:1 orthologs from before?

    There are 19. It’s more than the number of 1:1 because the genes are not necessarily forming pairs with only one gene of a species, but just need to be a clique.

  • 2. What is the signature sequence for the group (a sequence present in members of this group but not in other groups)?

    The group's fingerprint is available with the group description on top of the page.

    The gene’s signature is MTFAGAE. It can be used to track OMA Groups across releases.

  • 3. Look at the alignment. How conserved are the sequences? Some proteins have different sizes, how can we explain that?

    You can visualize conservation on the alignment (show conservation weight) and the percentage of identity of each sequence (show meta data, show identity score) by using the corresponding option under the Vis.element button (Top of alignment).

    The conservation is quite high (with 80-90% identity). Some sequences start at different positions, but this does not seem right. VULVU37201 could start at the second start codon (M) and HORSE05078 is probably missing the start codon.

Hierarchical Orthologous Groups

The evolution of a gene family describes the history of all the genes that shared a common ancestral gene. Those genes called homologs can be distinguished into orthologs if they start diverging by speciation and paralogs if they start diverging by duplication. In comparative genomics, gene families are a fundamental resource since they tend to represent the links between several organisms from a gene centric perspective and allow us to understand how genes and genomes have evolved over time.
A HOG is a set of genes that have descended from a common ancestral gene in a given ancestral species (i.e. at a specific taxonomic level). HOGs are hierarchical because groups defined at more recent clades are encompassed within larger groups that are defined at older clades, thus making them nested subfamilies.

The following exercises focus on analysing the evolutionary history of a gene family. For an introduction on how to use the iHam graphical viewer (needed to answer the following questions), see our documentation and YouTube video. Open the HOG page corresponding to the gene from before. It is the largest HOG in which this gene is present (root HOG). If you are starting the module from here search the HOG HOG:0434208.

  • 1. At what taxonomic level is the last common ancestral gene located at? In what common ancestral genome did all these genes descend from? At what taxonomic level did this gene originate? When did the root hog originate?

    Hint: these are all different ways of asking the same question.

    Eutheria

  • 2. How many ancestral genes comprise this HOG at the root level?

    1

  • 3. How many extant genes comprise this HOG at the root (eutheria) level?

    The root level is the first taxonomic clade from the left, represented under the HOG name. You can click on the clade to access the root HOG entry.

    173 genes

  • 4. Which extant genome have the most copies of this gene?

    Fukomys damarensis with 18 genes

  • 5. Which taxonomic clade has an unusually high GC content in its gene sequence?

    Set colour scheme under Options.

    Histricomorpha, a rodent subclade.

  • 6. How many genes in this family (i.e., the root HOG) are human genes?

    4. OR2L8_HUMAN, OR2L5_HUMAN, OR2L2_HUMAN and OR2L3_HUMAN.

  • 7. In which lineages did the duplications which led to the multiple human genes likely take place?

    The duplications took place in the lineages leading to Simiiformes, Cararrhini and Human

  • 8. How many genes in this family have five exons? In what species?

    Set colour scheme under options

    1, in Pteropus vampyrus

  • 9. Freeze the tree at the Simiiformes level. Based on the iHam visualisation, which ancestral clade experienced gene loss of ancestral gene “HOG:0434208.10c”?

    A gene loss occured in the lineage leading to Cercopithecinae