Data Citation Awareness
Who should read this?
This guide is intended for eResearch infrastructure support providers and researchers. It is not so much a guide to how to cite data, but a guide to the issues around it, and activities underway to change the culture around data citation in order to support improved data management and sharing.
What do we mean by data citation?
Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources. The need to cite data is starting to be recognised as one of the key practices underpinning the recognition of data as a primary research output rather than as a by-product of research. While data has often been shared in the past, it is rarely, if ever, cited in the same way as a journal article or other publication might be. If datasets were cited, they would achieve a validity and significance within the cycle of activities associated with scholarly communications and recognition of scholarly effort.
How do you cite data?
While at present there is no generally recognised way of citing data, examples are beginning to appear.
A recent OECD Publishing White Paperi by Toby Green sets out the need for a recognised standard and proposes a model which will be used by the OECD for its own data and data tables.
Altman and Kingii proposed a standard for citing quantitative data in 2007. This contains many of the elements common to print citations, to which are added components specific to quantitative datasets. Similar to the recommendations of the OECD White Paper and the citation supplied by ICPSR, their standard includes a permanent identifier (whether DOI or other) as an essential element. Their minimum citation includes only six elements, including the permanent identifier.
Various data repositories provide a recommended format for citing data from that repository. For example: ICPSR and other social science data centres provide a citation for each of their datasets as follows:
Kessler, Ronald C. National Comorbidity Survey: Baseline (NCS-1), 1990-1992 (Restricted Version) [Computer file]. ICPSR25381-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-05-11. doi:10.3886/ICPSR25381
The connection between data and publication is increasingly recognised. The following citation comes from PANGAEA, the Publishing Network for Geoscientific & Environmental Data in Germany. This applies to a data set, and the subsequent citation is to the article based on analysis of the data.
Kuhlmann, H et al. (2009): Age models, iron intensity, magnetic susceptibility records and dry bulk density of sediment cores from around the Canary Islands. doi:10.1594/PANGAEA.727522. http://dx.doi.org/10.1594/PANGAEA.727522 Supplement to: Kuhlmann, Holger; Freudenthal, Tim; Helmke, Peer; Meggers, Helge (2004): Reconstruction of paleoceanography off NW Africa during the last 40,000 years: influence of local and regional factors on sediment accumulation. Marine Geology, 207(1–4), 209–224. doi:10.1016/j.margeo.2004.03.017 http://dx.doi.org/10.1016/j.margeo.2004.03.017
The issue is complicated by the fact that bibliographic management systems such as EndNote and Zotero do not currently provide a template for a data citation.
Wouldn’t it be lovely if …
- The creation of data could be recognised as a primary research output,
- The use and re-use of data were accompanied by a full data citation, including a persistent identifier,
- Data use and re-use could be tracked and recorded in the same way as print publications, and
- Data citation information were used for research evaluation and reward.
The ANDS approach to data citation
An important aim of ANDS is to enable more researchers to re-use research data more often. To achieve this aim, ANDS is engaged in activities that will make it easier to share data, to recognise the importance of making data available and to make data citation a standard procedure.
ANDS has joined DataCite, a group of leading research libraries and technical information providers that aims to make it easier for research datasets to be handled as independent, citable, unique scientific objects. This is done by using Digital Object Identifiers (DOI) as permanent identifiers for datasets. ANDS is participating in the DataCite metadata standards working group.
Research Data Australia is a resource which will enable Australian researchers and research organisations to publicise their data. ANDS will offer, from 2011, a DOI minting service to provide datasets with a unique and traceable identifier. This can then be used by researchers when they cite their own data in publications. Data can then be registered through ANDS online service Register My Data, and be discoverable through Research Data Australia.
ANDS is working with both ThomsonReuters and Elsevier to investigate the feasibility of tracking and recording of data use through DOIs, and making that information available through Web of Science and Scopus. Both of these databases are used extensively world-wide as part of research assessment activities.
ANDS is engaging with research funding agencies to promote data publication as a primary research output and the inclusion of data in the research assessment process.
Directions around data publication
‘What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively. Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of data contribution.’ Nature editorial. 2009. ‘Data’s shameful neglect.’ Nature 461 (145): 168-170. http://www.nature.com/nature/journal/v461/n7261/full/461145a.html
Data only journals are now starting to appear. For example, see Earth System Science Data
i Green, T. (2009). ‘We Need Publishing Standards for Datasets and Data Tables.’ OECD Publishing White Paper. http://dx.doi.org/10.1787/603233448430
ii Micah Altman and Gary King. 2007. ‘A Proposed Standard for the Scholarly Citation of Quantitative Data.’ D-Lib Magazine, Vol. 13, No. 3/4 (March/April), http://www.dlib.org/dlib/march07/altman/03altman.html
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Australia License.