Current release

The entire OMA database is available for download in several formats. It is also possible to download each group separately. This option is available in the group view. Please read our terms and conditions before integrating OMA data into your own research or database.


Orthology Relationships

The orthology relationships are available in two types: groups or pairs of orthologs. The information is given in terms of OMA identifiers (of the form HUMAN04376).


OMA groups
Text format
OrthoXML format

Hierarchical orthologous groups (HOGs)
OrthoXML format

Species phylogeny of HOGs
Phyloxml format
Newick format

Pairwise orthologs
Text format

Pairs between two species
Genome Pair View

Sequences

All sequences with the corresponding OMA identifiers can be downloaded in fasta files. The proteins are all in one file, while the coding DNA is split into two files, one for the Eukaryotes and one for the Prokaryotes.


Protein sequences
Fasta format
SeqXML format

cDNA Eukaryotes
Fasta format

cDNA Prokaryotes
Fasta format

cDNA Viruses
Fasta format

Protein Annotations
Text format

Identifier Mapping

Mappings of the OMA identifier to various other databases are available. Mappings to UniProt, RefSeq and EntrezGene IDs are based on exact sequence matches, other cross-references come from source genome files directly.


Mapping to UniProt:
Text format

Mapping to Ensembl
Text format

Mapping to Refseq ACs
Text format

Mapping to Entrez Gene IDs
Text format

Mapping to NCBI GIs
Text format

Mapping to NCBI GenBank IDs
Text format

Mapping to Wormbase
Text format

Mapping to JGI
Text format

Mapping to GO
Text format

Plant mappings
Text format

Other files

OMA Groups/Sequences in COGs format
Cog format

Species information (Taxon IDs, scientific names, genome sources)
Text format

Group descriptions
Text format

Close OMA Groups
Text format

OMA Browser database (as hdf5)
HDF5 pytables
HDF5 pytables suffix index

OMAmer database files
LUCA.h5
Metazoa.h5
Viridiplantae.h5
Primates.h5

OMA ID History

Mappings of the OMA identifier of updated genomes from one release to another. We track only proteins with same amino acid sequences.