OMArk assesses the quality of protein-coding gene repertoires. It compares an extant species’ proteome to the expected gene repertoire of the lineage’s common ancestor, inferred from HOGs in the OMA database. It uses OMAmer to quickly place each query protein to its matching HOG (ancestral protein).
OMArk provides two key measures:
Completeness = how many of the conserved genes expected for the lineage are present.
Consistency = whether the proteome looks taxonomically and structurally coherent (few fragments, few contaminants, not too many “unknowns”, i.e. proteins with no detected homology).

Link to the OMArk paper: Nevers et al. 2025, Nat Biotechnol.
OMArk is available as both a command-line tool (recommended for large projects) and as a web server: https://omark.omabrowser.org/ (recommended for assessing a few proteomes).
Today’s exercises: explore precomputed results in the web interface, compare species within a clade, and interpret outliers.
You are interested in doing a comparative genomics study of whales and dolphins. Go to the OMArk browser and navigate to the Cetacea clade using Select Taxon. Make sure you are viewing the 2024 release by clicking Change datasets.
Which three species have outlier proteome sizes?
Inspect completeness bars for these three outlier species.
What proportion of genes are missing?
Compare the stacked bars in the whole proteome assessment for the outlier proteomes vs. normal cetaceans.
What unusual patterns do you see?
How are fragments defined in OMArk?
Examine the three versions of the Narwhal (Monodon monoceros) proteomes (UniProt vs. Ensembl vs. NCBI) in the 2024.06 OMArk server release.
Which would you choose for downstream analysis, and why?
Open the Physeter catodon proteome from Ensembl.
How many contaminants are reported and which type of organisms?