Automating the interlinkage between GBIF data and WoRMS taxonomy to enhance the data flow from GBIF to OBIS
Hanieh Saeedi, Senckenberg Research Institute and Natural History Museum, Germany
Anke Penzlin, Senckenberg Research Institute and Natural History Museum, Biodiversity Information, Frankfurt am Main, Germany
Leen Vandepitte, World Register of Marine Species (WoRMS) & the European node of OBIS
BiCIKL Contact persons
Christos Arvanitidis, LifeWatch
Joe Miller, GBIF
BiCIKL Research Infrastructures involved
GBIF, LifeWatch ERIC, ENA
Non-BiCIKL Research Infrastructures accessed
Ocean Biogeographic Information System (OBIS)
Senckenberg Research Institute and Natural History Museum, Biodiversity Information, Frankfurt am Main, Germany (SGN)
Biodiversity data classes and services included
Species names; WoRMS, LifeWatch vLab taxonmatch, GBIF backbone
Sequences; European Nucleotide Archive (ENA)
Data in the Ocean Biodiversity Information System (OBIS) are associated with the World Register of Marine Species (WoRMS) as the taxonomic backbone, the most reliable source for marine species taxonomy. That is why AphiaID (preferably as an LSID; a unique identifier for each taxon in WoRMS) is a mandatory field in data contribution to OBIS.
In GBIF, occurrence data are matched to the GBIF backbone by a scientific name string. Thus, many musea worldwide share their data with GBIF using different platforms (e.g. IPT and BioCASe installation) without any taxon ID. When those musea try to flow their data from GBIF to OBIS, they face the problem of missing identifiers and having to match all available scientific names to WoRMS, either through the available WoRMS services or the LifeWatch Species Information Backbone, including WoRMS, to comply with the OBIS data quality requirements. The process of taxon matching for huge collection materials for medium-big size musea can be complicated and time-consuming.
While the WoRMS catalogue is a checklist in GBIF and almost 95% mapped to GBIF backbone, we did not find a possibility to use this connection between the two checklists. For occurrences already mapped to a taxon in the GBIF backbone, it should be possible to extract the AphiaID of the corresponding taxon from the WoRMS taxonomy.
Our idea is to interlink the data contributed to GBIF with WoRMS taxonomy automatically or via LifeWatch vLab, based on taxon matches in GBIF. In this case, the AphiaID could simply be extracted for those entries in GBIF. This would result in a much more straightforward and (taxonomically) quality controlled data flow from natural history collections to OBIS. In addition, it would also provide major quality benefits to GBIF for available (marine) scientific names and a closer collaboration between the OBIS and GBIF communities.
Implementing an interlink between GBIF data records and the WoRMS taxonomy via the LifeWatch connection, and extracting the AphiaID directly from GBIF might open up the same possibility and pathway for taxon checklists existing in other resources with a high overlap to GBIF taxonomy. One of these resources is the European Nucleotide Archive (ENA) which archives comprehensive information on the world’s nucleotide sequencing. This possibility thus facilitates the data flow between GBIF, ENA, and OBIS using WoRMS and LifeWatch vLab taxon match services to link diverse data types together in new ways. This effort will greatly enhance marine data sharing and quality control worldwide.