Glossary Web based database interface for orthology prediction

Term Definition
5-letter species code Organism code based on UniProt species codes
all-against-all The part of the OMA algorithm where all the protein sequences are compared to all of the other protein sequences by Smith Waterman
ancestral gene A gene in the common ancestor at the taxonomic level comprising all the species of interest. HOGs at a particular taxonomic level can be considered as one gene in the common ancestral species.
ancestral genome As HOGs are by definition all genes that descended from an ancestral gene, ancestral genomes are all HOGs at a given taxonomic level
cross reference An alternative identifier for a gene, assigned by the annotation source or database
domain of life Eukaryote, Prokaryote, Archaea
domain architecture In OMA, domain architecture refers to the 2D visualization (boxes on a string representation) of the annotated protein sequence domains
entry Synonymous with gene
evolutionary distance The amount of divergence between two protein sequences
extant species Species which are still living today; the leaves of a species tree
gene The unit of evolution used in OMA for orthology inference. However, only protein sequences are used for sequence comparison in the homology inference process.
gene ontology Controlled vocabulary and hierarchy on the biological functions of genes. See http://geneontology.org/
genome The collection of all the genes in an ancestral or extant organism. In OMA, the genomes of species are represented by 1 protein sequence per locus.
group A cluster or group of orthologous or paralogous genes in OMA.
HOG Hierarchical groups contain genes that descend from a single common ancestral gene within a given taxonomic range.
HOG-induced ortholog All genes which started diverging at the last common ancestor of the two species in question
homoeolog Genes of an allopolyploid which started diverging by a speciation event, and were brought back to the same genome via a hybridization event.
member genes All the genes which comprise a group
OMA database All the data associated with a given OMA release, including: genes, sequences, locus information, groups and more. Can be accessed through the browser or programmatically
OMA Group OMA groups contain sets of genes which are all orthologous to one another within group. This implies that there is at most one entry from each species in a group.
OMA Group-induced ortholog The pairwise combinations of all the genes in a given OMA Group
OMA identifier Consists of the five-letter UniProtKB species code and a unique 5-digit number
OrthoXML Standard output format for orthology data. See http://www.orthoxml.org/xml/Main.html
pairwise ortholog Orthologs in OMA inferred by comparing two genomes. After the all-against-all phase, and two sequences which are the best bidirectional hits within a confidence interval (to allow for more than one hit, i.e. duplications), AND pass the witness of non-orthology test
PAM unit Point accepted mutation. A measure of evolutionary distance; the amount of amino acid substitutions per 100 amino acids of a protein sequence. One PAM unit means that 1% of the amino acids were replaced since the divergence of the two protein sequences.
paralog Genes which started diverging by duplication. In OMA they inferred as
PhyloXML A standard format for phylogenetic trees and their associated information. See http://www.phyloxml.org/
protein ID The identifier for a gene in OMA
relation type Reflects the level of co-orthology, or the degree of duplications which one or both of the orthologs in a pair has undergone. One-to-one (1:1) pairwise orthology means that both genes in the pair have only one ortholog in the other species. A one-to-many relationship (1:m) means that the gene of interest has more than one ortholog in the other species. This implies that the gene was duplicated in an ancestor of the other species, but after the speciation event. A many-to-many (m:m) relationship means both orthologs underwent lineage-specific duplications.
root HOG The deepest taxonomic level which relates all the species of a HOG. The root HOG can be thought of as the gene family for the species in OMA. The root HOG is comprised of sub-HOGs
subHOGs Nested HOGs (or sub-families) within the root HOG. The subHOGs arise due to duplication events at a given taxonomic level.
taxon Synonym for species or genome in the OMA database
UniProt ID The stable and unique identifier for a gene from UniProt
Witness of non-orthology The final step in the OMA algorithm for inferring pairwise orthologs. In cases where an ortholog is missing, we seek to avoid erroneous classification of paralogs as orthologs by verifying stable pairs with sequences in a third genome that can act as witness of evolution.