ANDS Logo
bannerbannerbannerbanner
 Find Research Data:

Vocabularies and research data

Awareness level

Download PDF version of this guide

Overview

This guide explains what vocabularies are and how they are useful for supporting research. A brief discussion of vocabulary services is included.

What is a vocabulary ?


A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. It is a kind of model of the concepts in a discipline, with labels applied to the concepts and some kind of structure relating the concepts to each other.
Vocabularies take many forms. They include authority files, glossaries, dictionaries, gazetteers, code lists, taxonomies, subject headings, thesauri, semantic networks and ontologies.

More technically,

How do vocabularies support research?


Data specification and description

When sharing data or combining data from different sources, there is a need for an agreed language to make sure the meaning of data is clear and explicit.

Researchers planning observation or surveys need to define their data items clearly. In formal system development environments this is done using metadata registries, data dictionaries, or data modelling software to define the permissible values/codes for data.

An agreed vocabulary (a standard) makes a good starting point for translating concepts into other vocabularies so that collaboration can occur.

Examples of vocabularies used to specify data values:

Example of data specification

Data analysis

Ontology-mediated data integration

In this process scientists annotate data sets with semantically precise terms from an ontology, enabling reasoning across the data and transformations of the data for further analysis.

Statistical analysis

Statistical analysis involves aggregating data and applying statistical analytical techniques. Use of standard classification schemes (a kind of vocabulary) means that data from different sources can be compared. If standard classifications are not used, it is difficult to aggregate data from different sources with a high degree of confidence.

Examples of statistical vocabularies

Data retrieval

Indexing vocabularies are used to tag items in library catalogues and search portals and to provide keywords for academic journal articles. Without indexing vocabularies search precision is reduced and valuable relevant research may not be retrieved. Indexing vocabularies are most effective when they mirror the searcher's terminology and conceptual perspective.

Examples of indexing vocabularies:

Example of journal article with keywords:  Exampleexternal link

Keywords from a vocabulary

Vocabulary services


Traditionally most vocabularies were managed in custom software, and either printed or published as read-only web pages or downloadable documents (for example, see the APAIS Thesaurusexternal link).

A vocabulary service is a machine-to-machine service that can support activities such as creating, managing and querying vocabularies.

Examples of vocabulary services:

ANDS is developing a prototype Controlled Vocabulary service. Read more about this project 

SKOS

Knowledge organisation systems such as thesauri or any other type of structured controlled vocabulary can be represented using SKOS (Simple Knowledge Organization System). SKOS provides a standard way to represent knowledge organisation systems using the Resource Description Framework (RDF). This means that vocabulary information can be passed between computer applications in an interoperable way.

Find out more


Introduction to vocabularies:

Standards:

  • ANSI/NISO Z39.19external link - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies 2005 (revised 2010)
  • ISO 25964 -1:2011 Information and documentation -- Thesauri and interoperability with other vocabularies -- Part 1: Thesauri for information retrieval
  • SKOSexternal link Simple Knowledge Organisation System
  • Resource Description Framework (RDF)external link

ANDS prototype Controlled Vocabulary service