the COmpendium of GENetic daTabases (COGENT)



Our aim is to create a genomic database knowledge hub for the benefit of the Parkville research community, to improve access to both public and private databases via the sharing of knowledge, experience and resources. The initiative will be led by Bobbie Shaban at Melbourne Integrative Genomics / Melbourne Data Analytics Platform and Gene Melzack of Scholarly Services. They will:

  • Provide information on available datasets including technical requirements, restrictions of use, data access and relevant software
  • Provide assistance with applications for access.
  • Help assess computational requirements and advise on availability of local resources such as Spartan.
  • Advise on standard software pipelines.
  • Facilitate local data sharing where feasible.
  • Regularly update a blog with tutorials and latest information about data releases

We can’t cover everything, and so we seek feedback from community members about priorities.  We also welcome offers to share existing expertise and experience for the benefit of others.  Below is an initial list of databases for review:

Requesting a Database/Dataset

There are already a couple of datasets already available for use by University of Melbourne Staff and Students:

KEGG: Kyoto Encyclopedia of Genes and Genomes

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that integrates genomic, chemical, and systemic functional information. In particular, gene catalogs in the completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism, and the ecosystem. The KEGG database and web services are freely available, and the Library has a subscription to the KEGG FTP site. Access to data via the KEGG FTP server ( is restricted to subscribers and requires a password. University of Melbourne staff and students can get U of M password here:

TAIR: The Arabidopsis Information Resource

The Arabidopsis Information Resource (TAIR) is a comprehensive curated online genome resource for the model plant species Arabidopsis thaliana and a reference for plant gene function across all plant species. Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. Gene product function data is updated every week from the latest published research literature and community data submissions. TAIR also provides extensive linkouts to other Arabidopsis resources.

To submit a request to subscribe to a database or purchase a dataset for the University, please send an email to:

More Information

For more information contact Bobbie Shaban at Melbourne Integrative Genomics (0.2 FTE) /Melbourne Data Analytics Platform (0.6 FTE): or (03) 8344 8731.