Module 1: Finding orthology with the OMA Browser

The OMA browser serves as an access point for the OMA database, which contains precomputed homology data for over extant and ancestral genomes for over 2800 species (see the latest list of species).

The OMA browser focuses on three main data types: genes, groups, and genomes. Gene-centric pages provide detailed information about a specific gene, including its sequence, cross-references, functional annotations, and evolutionary data. Group-centric pages classify genes into OMA Groups (Orthologous Groups; OGs) and Hierarchical Orthologous Groups (HOGs) to define families and subfamilies. and Hierarchical Orthologous Groups (HOGs) to define families and subfamilies. Genome-centric pages offer information about extant or ancestral species, associated genes, related genomes, and a synteny viewer.

Back to home / Reset

1.1. Browsing the gene page

Gene-centric pages in OMA give all the information specific to a single gene in OMA. The gene is found at the top, with its OMA ID and UniProt ID. Different sub-pages are available on the left-hand scrollable menu with specific information, including the orthologs, paralogs, gene information, isoforms, GO annotations, sequences, and local extant and ancestral synteny for this gene.

Consider a scenario where you ran a gene network analysis and found that the human gene with UniProt ID OR2L5_HUMAN is involved in an interesting pathway. Search for this gene on the OMA homepage.

  • 1. Based on the “Gene information” tab, what is this gene?

    Olfactory receptor family 2 subfamily L member 5, one of many human olfactory receptors. More information can be accessed by following the link to Ensembl or UniProt.

  • 2. Where is this gene located in the genome?

    The gene is located on chromosome 1, starting at position 248021948, and ending at 248022886.

  • 3. Based on the Gene Ontology annotations, what function is this protein probably involved in? How sure are these annotations?

    The protein is probably an olfactory receptor involved in smell detection. The “olfactory receptor activity” Molecular Function has the codes IEA (Inferred from Electronic Information) and IBA (Inferred from Biological aspect of Ancestor) provided in the Evidence and reference column. Therefore, one should be careful with these annotations since they were not confirmed experimentally.

  • 4. Does this gene share any localized conserved synteny among any other species in Hominidae? If so, which ones?

    By checking the “Local synteny” on the left pane, we can see that this gene is not strictly conserved, but we can find a few of its neighboring genes that are also orthologous to neighboring genes in other species. Another striking point is that the genes surrounding the query gene are also olfactory receptors (found by hovering over the genes).

  • 5. Go to the orthologs table. How many orthologs are inferred by OMA overall?

    153 orthologs.

  • 6. Pairwise orthologs in OMA refer to pairs of genes from different species that are considered to be orthologous to each other. How many 1:1 pairwise orthologs are there?

    You can sort orthologs by their relation type by clicking the arrow on the column's header. Alternatively, you can type “1:1” in the search bar above the table.

    There are only four 1:1 relations, in Hominidae. This means that this specific gene was present in the ancestral Hominidae and there were no further duplications of this specific copy. Ancestral duplications before this clade or lineage-specific duplications in other clades explain the m:1 orthologs (many co-orthologs in another genome-to-one ortholog in human).

  • 7. How many orthologs inferred are supported by HOG inference?

    This human gene has 153 orthologs that are also found by the HOG inference.

  • 8. Why is there difference between the number of pairwise orthologs and the number of HOG-supported orthologs?

    Because pairwise orthologs are used to build the HOGs. Some pairwise orthology relations may be cut from the orthology graph, meaning they will not be found in the same HOG.

  • 9. How conserved is the domain architecture of these orthologs? What is this domain?

    There is only one domain in most species, with identical annotation. The domain architecture is very conserved. The domain is a 1 Rhodopsin 7-helix transmembrane protein domain, a transmembrane domain common in olfactory receptors.

  • 10. How many paralogs are there in Human for this gene? When did they duplicate?

    3 paralogs. They appear to have been duplicated in Catarrhini or later.

1.2. Exploring Hierarchical Orthologous Groups

The evolution of a gene family describes the history of all the genes that descended from a common ancestral gene.

A Hierarchical Orthologous Groups (HOG) is a set of genes that have descended from a common ancestral gene in a given ancestral species (i.e. at a specific taxonomic level). HOGs are hierarchical because groups defined at more recent clades are encompassed within larger groups that are defined at older clades, thus making them nested subfamilies.

The following exercises are focused on analyzing the evolutionary history of a gene family. For an introduction on how to use the iham graphical viewer (needed to answer the following questions), see our documentation and YouTube video.

Open the HOG page corresponding to the gene from before (OR2L5_HUMAN -> Click on the Groups button). The HOG displayed is the largest HOG in which this gene is present (known as a “Root HOG” in OMA).

  • 1. At what taxonomic level is the last common ancestral gene located at? In what common ancestral genome did all these genes descend from? At what taxonomic level did this gene originate? When did the Root HOG originate?

    These are all different ways of asking the same question.

    Eutheria

  • 2. How many ancestral genes comprise this HOG at the root level?

    1 ancestral gene. This means that there was an ancestral species whose genome contains this ancestral gene. Over the course of evolution, it evolved through speciation and duplication, resulting in all the extant genes present today.

  • 3. How many extant genes comprise this HOG at the root level?

    In the header under the HOG ID, the root level is the first taxonomic clade from the left. You can click on the clade to access the root HOG entry.

    199 genes

  • 4. Which extant genomes have the most copies of this gene?

    Fukomys damarensis with 18 genes

  • 5. How many genes in this family (i.e. root HOG) are human genes?

    4

  • 6. In which lineages did the duplications likely take place that resulted in the multiple human genes?

    The duplications took place in the lineages leading to Primates, Cararrhini and Hominidae.

  • 7. How many genes in this family have 5 exons? In what species?

    Set color scheme under Options.

    1, in Pteropus vampyrus

1.3. Browsing the Genomes page

Genomes on the OMA Browser can be either extant (modern-day species) or ancestral. OMA leverages HOGs to model ancestral genomes; these ancestral genomes each correspond to an internal node of the Tree of Life. Conceptually, HOGs can be thought of as ancestral genes, as they encompass orthologs and paralogs descending from a common ancestral gene at a specific taxonomic level. Thus, the HOGs are proxies for ancestral genes in a common ancestor and the collection of HOGs at a given level are proxies for ancestral genomes.

We will first explore an extant genome: Human. Search for this by typing “HUMAN” in the search bar and choosing species for the field, or search by the home page -> Explore -> Quick access to Genomes. Go to the extant human genome.

  • 1. How many genes are in this species, not including alternative splice variants?

    There are 20,430 genes in this species.

Next, let’s explore one of the ancestral genomes leading to human: Primates. Click on this genome to get to the Ancestral Genome page.

  • 2. How many genes was this primate common ancestor inferred to have?

    38,457 genes

HOG inference may not always be 100% reliable. OMA provides a “Completeness Score” to measure the HOG quality. The Completeness Score is defined as the number of species in that taxonomic clade present in the HOG / the total number of species in the taxonomic clade.

  • 3. How many genes were in the primate common ancestor if we filter to only HOGs with at least 30% of the species present in the HOG?

    18,804 genes

Click on “Ancestral Gene Order.” Shown are the ancestral chromosome reconstructions, called “ancestral contigs.”

  • 4. How many genes are in the first ancestral contig?

    478