Dynamic Data Citation as a Service

19:00 Tuesday 28 May




Chris Schubert (Austria) 1; Katharina Sack (Austria) 1; Georg Seyerl (Austria) 1

1 - Climate Change Centre Austria - Data Centre

Citing datasets in an appropriate manner is recognised as good scientific practice. Dealing with the original source as part of the provenance within the data life cycle is an essential basis for research data to carry out verification, reproducibility, a basic element against data manipulation and to give the author credibilities of their data set.

The CCCA Data Centre has taken the development of a structured, web-based service that describes the relations between originals and derivatives as subsets, their versioning and an automated citation text. This application was formed as an RDA (Research Data Alliance) Data Citation working group pilot on NetCDF Climate Scenarios. The CCCA Data Centre is a part of the Austrian Research Infrastructure which act as central access point to provide distributed climate information.

To establish reproducibility of data processes and their re- use, researchers need the ability to identify the exact version of a dataset used. Especially for large files, like geo spatial data, to overcome a download behavior and process data on the individual desktop environment, normally all meta-information, as well the relation to the origin and different version are lost and has to describe again. If only subsets of a dataset have been used, it is very time-consuming and complex to cite the original data and the versions based on it.

The CCCA Data Centre developed a dynamic data citation tool. The main motivation was to contribute components in the framework of a data life cycle management, to keep versioning, to assign all related derivates with a persistent identifier, align the resources by storing queries and make queries executed again. It’s creates automatically a landing page of subsets which contains the a citation as well inherited meta data in a dynamic manner.The objective was to describe the processes for subset formation precisely and comprehensively in a web based framework.

The presentation gives a brief overview of the technical implementation, features developed for the creation of subsets, how to identify resources by storing queries and executing queries again as well as the automatic creation of a landing page of subsets which provides a citation text as well as inherited metadata in a dynamic manner, data.ccca.ac.at/.

Beyond to describe the processes for subset formation precisely and comprehensively in a web based framework, we want to discuss our Open Source solutions, the scalability and the potential extensions to other communities approaches, like OpenEO