Meaning and purpose
Services in the research domain support the creation or use of research collections and datasets. Services can be web services, provided across the web and following a well-defined machine protocol, such as OAI-PMH Harvest or RSS Syndication; but they may also be provided by offline software (e.g. the functionality of software running a simulation, or creating annotations).
As with parties and activities, the ANDS Collections Registry gathers service descriptions in order to provide context for the collections it registers, and to enable discovery of related collections, rather than to serve as an exhaustive registry of research services. For that reason, the services described in the registry are always related to collections—whether the service exposes the collection, or was involved in creating the collection.
Service is described by the e-Framework as a resource characterised by the functionality (performance of tasks) it provides.This is consistent with the definition of services given in ISO 2146: 'a system (analogue or digital) that provides one or more functions of value to an end user'. To be used, a service must be implemented.
Service delivery methods
A service must have a specific delivery method which makes it available to a client.
Delivery Methods include:
- Web service: according to the W3C, "a software system designed to support interoperable machine-to-machine interaction over a network . It has an interface described in a machine-processable format". (Unlike the W3C, we do not restrict this delivery method to WSDL.)
- Software: all services provided by software other than as web services; users interact with these through a user interface or on a local system. This includes Unix applications, PC/Mac applications, and software accessed through a browser.
- Offline service: a service not provided through computers or the internet. Instruments such as beamlines and microscopes are normally modelled as offline services.
- Workflow: a service that orchestrates other services. Kepler workflows, which script how various instruments and computational tools interact to deliver an output, are an example of a workflow.
Web services are the most straightforward type of service to model: the definition of their function and scope is specified through statements of behaviour and data representation, and they have a well-defined protocol for interaction with service clients. These protocols can usually be indicated through the service type.
Other types of service are used to model instruments, software, and workflows. These tools often do not have well-defined protocols for interaction, so protocols need not be specified in their service description. These tools also have properties which are not captured by modelling them as services (e.g. asset numbers, operating systems): this partial representation is deliberate, because of the restricted scope of service descriptions.
Service descriptions in the ANDS Collections Registry are meant to convey only high-level, indicative information. More complete detail about data collection provenance should be provided in local metadata stores, and linked to as Related Info from the service description.
Instruments are modelled as offline services—although strictly speaking what services model is the capability of instruments to create data collections. Instruments are often housed in facilities, but facilities should be modelled in the ANDS Collections Registry as parties: they are the organisations which own the instruments. Instruments can be composed of individual sensors; both the large-scale and more fine-grained instrument may be of interest to users. Instruments can be related to each other in a partOf relationship. For example, a specific detector can be part of a Synchrotron beamline instrument, or of a radio telescope.
Whether to model both the instrument and its component sensors in the ANDS Collection Registry depends on whether it will be useful to discover collections through sensors, rather than just through the instrument. This is a policy decision for partners; some partners have already elected not to do so. The details of sensors used to gather the data should at any rate be recorded in local metadata stores.
To be used, a service must also be instantiated: there must be a particular instance of the service being described, rather than the class of all matching services, and it should be possible to name the location of the service, and the parties managing the service. For example, the ANDS Collections Registry would describe the Monash University ARROW repository OAI-PMH feed, rather than giving a generic description of the OAI-PMH protocol.
Treating services as instances means that there may be many service records in the ANDS Collections Registry that look quite similar—distinct sensors, for example, or distinct deployments of RSS. As long as each instance is associated with a collection registered with the registry, it is still appropriate to distinguish between the service instances.
Exceptionally, software services may be described as implementations, rather than instances. A record can describe the downloadable software for the service, rather than an instance of the software running on a specific machine. A separate record would still be expected for different versions of the same software, or for different implementations.
Depending on how services relate to collections, services can be classified as Creation services, Metadata services, Discovery services, or Reuse services.
- Creation services add data to collections; e.g. simulators, instruments, visualisation software.
- Metadata services add metadata to items and collections; e.g. annotation software, classification software.
- Discovery services enable read-only access to collections; e.g. search, harvest, syndicate.
- Reuse services enable the reuse of research data. This includes Rights Management, Data Storage, Publishing, Ethics and Governance.
Discovery services are typically web services; creation services typically have other delivery methods. The service type is described by choosing from the following:
The kind of service (service type) is described by choosing from the following (ANDS is currently considering expanding this list):
- harvest-oaipmh: OAI-PMH Harvest —Open Archives Initiative Protocol for Metadata Harvesting. See also http://www.openarchives.org/
- search-http: Search service over HTTP. RFC2616
- search-opensearch: OpenSearch search—a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. See also Wikipedia
- search-sru: SRU search-SRU is a standard XML-focused search protocol for Internet search queries based on Z39.50 semantics.
- search-srw: SRW search-SRU VIA HTTP SOAP ('SRU via HTTP SOAP ' is the former SRW). SRW/U is being deployed as the search API for the DSpace initiative. It is being considered as the standard search API by a number of communities, including the meta-searching and geospatial searching communities.
- search-z3950: z39.50 search - the International Standard, ISO 23950: Information Retrieval (Z39.50): Application Service Definition and Protocol Specification, (also ANSI/NISO Z39.50). The standard specifies a client/server-based protocol for searching and retrieving information from remote databases.
- syndicate-atom: ATOM syndication - an XML-based Web content and metadata syndication format. http://tools.ietf.org/html/rfc4287
- syndicate-rss: RSS feed—a family of web feed formats that are specified using XML.
Creation & Metadata services
- create: produces a new data object representing existing phenomena in the world, including physical reality and user input. An instrument creates data.
- generate: produces a new data object out of mathematical formulae and parameters, rather than capturing and representing existing data in the world. A simulator generates data. (The simulation is the generated data.) A random number generator generates data.
- report: presents existing data in a summary form. A visualisation reports on data.
- annotate: links an annotation to a data object, or part thereof.
- transform: changes a data object into a new data object, with a distinct format. An analysis tool creates a new data object out of data (either raw data, or other analyses).
- assemble: builds a new data object instance composed of existing data objects. A survey generation tool creates a survey form out of user input and templates.
The service names for creation & metadata services are deliberately generic (and are taken from the e-Framework, which is not research-specific). To apply them, use the following:
What is the input into the service?
- Observations of the world: create
- Mathematical models: generate
- An existing dataset: What is the output of the service?
- Another dataset: transform
- A summary or visualisation of the dataset: report
- Commentary on the dataset, or on parts of it: annotate
- Another dataset: transform
- Multiple datasets, and the output is a single dataset: assemble
No reuse services have been included in the current service type vocabulary. The service type vocabulary can be expanded as the community requires—subject to the constraint that it describes services in the ANDS Collections Registry, which are specific to registered collections.
Access policy for services
Services may also have access policies. These are described in a separate element. More information
Research domain examples
Researcher Fred from Notre Dame University uses the Brahe interferometer on the Farnell Radio Telescope, to gather observations on pulsar THX-1138. The observations are registered with ANDS as a collection.
- The Brahe interferometer is described in the ANDS Collections Registry as a Create service, since it was used to create the pulsar observations.
- The Farnell telescope itself is also described in the ANDS Collections Registry as a Create service.
- The location of the Brahe interferometer is given as the physical address of the Farnell Telescope, and the Norfolk Island Astronomical Commisariat is listed as the owner.
- The Brahe interferometer is related in the ANDS Collections Registry to the TXH-1138 pulsar data collection, and also to a SEN-5421 pulsar data collection. Users who are consulting the ANDS Collections Registry for TXH-1138 can also discover SEN-5421, as generated by the same creation service.
The pulsar data collection represents raw data. The Tempo2 pulsar timing software is used to extract pulsar timing data from a range of observations, including TXH-1138, and the resulting analyses are also registered with the ANDS Collections Registry.
- Tempo2 is registered as a transform service with the ANDS Collections Registry, and is related to several data collections which it has generated. Tempo2 is distinguished from the earlier version Tempo1; but the same service description is used for analyses generated by the Tempo2 software, whether it was running at Farnell, Palomar, or a university laptop.
- The location for the service is given as the software SourceForge page.
The pulsar data collection is exposed for search through the SRU protocol. The web service allowing this search is hosted at the University of Launceston.
- The search service running at UoL is registered as a search-sru service with the ANDS Collections Registry.
- The location of the service is the address at UoL to which SRU queries are sent.
The following diagram illustrates the relations of the objects described in this scenario:
The date metadata describing a service was last changed in the source system can be recorded. See Date modified.
Use in Research Data Australia
Metadata records describing services are grouped together on the Research Data Australia home page. The service category and service type are displayed. The hyperlink to a page or XACML document describing service access policies is displayed. Date modified is not displayed. All information is searchable.
RIF-CS best practice guidelines
When to describe a discovery service
Often a collection is tightly bound with its discovery service, so there can be confusion about whether to model it as a collection or a service. The purpose of the ANDS Collections Registry is to promote the discovery of collections, not of services. So an entity such as a repository or portal must have a relevant collection description contributed to the registry. It can also have a relevant service description contributed, if that service description adds sufficient value. A discovery service that does not provide access to a specific collection is not relevant to the ANDS Collections Registry, and likely needs to be modelled differently.
For example: a podcast is a collection of recordings, combined with a syndication service for accessing that collection. The podcast should be described for ANDS as a collection, since that is the aspect of the podcast most relevant to the Collections Registry. The RSS feed to the podcast can be added to the Collections Registry as an associated discovery service (syndication-rss). But the podcast should not be described as a service instead of a collection.
HTTP-Search for a single keyword can be assumed as default search functionality for a collection. (This is the single search box on the home page of most collections.) If the ANDS Collections Registry already has a description of such a collection, then a single-keyword search need not be registered in the ANDS Collections Registry as a distinct service description.
Portals provide access to an aggregation of collections. A portal can be modelled as either a service or as a collection; if it is modelled as a service, its constituent collection should also be described in the ANDS Collections Registry.
The service type is a two-part string, with the first part specifying the service genre and the second part specifying the protocol (for example, syndicate-rss, harvest-oaipmh, search-sru). For creation and metadata services, which do not have generically used protocols, only the service genre is specified.
If there is a well-defined protocol for an instance of a creation or metadata service, the service description should provide that protocol information in the Related Info element. Added protocol information should also be provided in the Related Info element for discovery services, if there are local extensions to the service protocol that service users need to know.
The value for the service genre is taken from the set of service genres registered with the e-Framework. The protocol is taken from known services identified by initial Collections Registry content providers. New genre-protocol combinations may be added on application to the RIF-CS schema manager (contact firstname.lastname@example.org).
Software tools can have multiple types applicable out of the service type vocabulary: unlike web services, software tools can perform multiple functions. However the service description of software tools shall have a single type, reflecting the primary use of the tool in the research community.
For web services, the electronic address is a URI that provides access to the service: in particular, it is a URI that can be processed by a client following the service protocol (service endpoint).
If the service is syndicate-rss, for example, the location in the service description will be a URI that can be processed by an RSS reader.
Web services alone may use the <arg> element in addition to the <value> element, to differentiate between a base URL and the service arguments. This only applies to HTTP Query services, in which the service call URL contains service arguments. The <arg> element indicates whether each of the URL arguments is required or optional, whether they are plain text or embedded objects, and whether they are inline (embedded in the base URL) or key-value pairs in a HTTP query. The <arg> element does not describe the semantics of the arguments, and should not be treated as a substitute for linking to protocol documentation for the service.
If the electronic address type is "wsdl", the <value> element must be a URL pointing to the WSDL file. Human-readable descriptions of the service online should be recorded in the Related Info element instead. A physical address or electronic address (email) can be provided as a contact for arranging access to the service. Typically this will be the same address as for the party managing the service.
Software and workflows
For software and workflows, the electronic address is likewise a URI that provides access to the service: in particular, it is a URI that the software or workflow can be downloaded from. In this case too, human-readable descriptions of the software should be recorded in Related Info instead. A physical address or electronic address (email) can be provided as a contact for arranging access to the service.
For offline services, a web address is not acceptable as a location. That is because an instrument home page does not provide direct access to the service, the way an RSS feed address or a search query does. Web pages about the service should be recorded in the Related Info element, just as they are for online services. A physical address or electronic address (email) should be provided instead; as above, the physical address is intended to allow users to gain access to the offline service (contact address).
Delivery Method will be suggested for inclusion in future versions of RIF-CS. As an interim measure, include the delivery method as a string without spaces (webservice, software, offline, workflow) in a description element of type "deliveryMethod".
Most of the relations described below are bidirectional; for discovery to be most effective, they should be represented in RIF-CS in both directions. In particular, if a collection links to the creation service that produced it, the creation service should also link out to all the collections it has produced. This allows discovery of more collections.
Often information on relations is only available in one direction: the description of a collection will link to the service that produce it, but the description of the service does not have access to the collections that the service has produced. In such cases, it is desirable for ANDS to automatically generate bidirectional links between the objects. This functionality is forthcoming.
Currently the only relation modelled between services is hasPart/isPartof. Creation services can often be modelled as part of another creation service, as with sensors and instruments, or individual services and service workflows. Metadata and Discovery services, on the other hand, are not normally modelled as forming part of other services.
Service descriptions must have a relationship to at least one collection. Depending on the service type, services and collections can have the following relations:
- All services: supports/isSupportedBy
- Discovery Services (Harvest, Search, Syndicate): isAvailableThrough/makesAvailable
- Creation Services (Create, Generate, Assemble, Transform output): isProducedBy/produces
- Creation Services (Report): isPresentedBy/presents
- Creation Services (Transform input): isOperatedOnBy/operatesOn
- Metadata Services (Annotate, Classify): addsValueTo/hasValueAddedBy
The supports/isSupportedBy relation is generic; the other relations are specialisations of this relation.
If a transform or assemble service is used to change collection A into collection B, the service operates on input collection A, and produces output collection B. (For collection discovery, the produces relation is more important than the operates on relation.) Collection A and collection B are related through the relation isDerivedFrom/hasDerivedCollection. This relation is distinct from partOf: if a collection is derived from another collection, the output is a new collection, and is not considered part of the old.
If service A is part of service B, and service A is related to a collection, then service B should not also be modelled has having the same relation to the collection. It is best practice in information science to link only to the most detailed level. For example, a collection would be linked only to the Brahe interferometer—and not to both the Brahe interferometer and the Farnell telescope. Users should navigate down from the Farnell telescope to discover collections associated with individual receivers.
The following relations can be modelled between parties and services:
- isManagerOf/isManagedBy: the individual or group oversees the service
- isOwnerOf/isOwnedBy: the individual or group legally possesses the service
The relationship between a facility and its instruments is modelled through the isOwnerOf relation.
Note that the owner of a service is distinct from the owner of the associated collection. In the example above, the Norfolk Island Astronomical Commissariat owns the telescope that captured the pulsar data, but the pulsar data itself is owned by Notre Dame University.
No relations are currently modelled between services and activities. The existing relations isOutputOf and isFundedBy between activities and collections could be extended to services. However this level of detail is beyond the requirements of the ANDS Collections Registry, and is appropriate instead for a services registry.
The relation hasAssociationWith, as with other registry object classes, allows an unspecified relationship to be signalled between the service and the target object.
...[remainder of service record]
Create service example
<namePart>Farnell Telescope, Brahe Interferometer</namePart>
<physical type="postalAddress"><addressPart type="text">Norfolk Island Astronomical Commisariat, PO Box 276, Norfolk Island 2899, Australia.</addressPart>
<description type="brief">A low-noise S-band interferometric receiver.</description>
<relatedInfo><title>Home Page</title><identifier type="uri">http://www.niac.org.nf</identifier></relatedInfo> </service>
Creating a relation from a collection to a service example
<namePart>Pulsar THX-1138 raw observation data</namePart>
<description type="brief">Pulsar THX-1138 raw observation data gathered using the Brahe interferometer.</description>
RSS syndication example
<namePart>RSS 2.0 Feed from MY University institutional repository</namePart>
<arg required="true" type="string" use="keyValue">identifier</arg>
|April 2010||Consultation draft|
|26 October 2010||First web publication|
|25 January 2011||Complete revision to add creation and metadata services|
|14 April 2011||Added link to Access Policy (services only) page|
Please send any feedback on this page to email@example.com