LibGuides: Open Research Toolkit: Infrastructure

Infrastructure

Infrastructure for open access storing and sharing includes systems that house and facilitate access to publications, data (including both processed and raw data as well as associated metadata), software and code. These are commonly referred to as repositories.

Across Australian institutions, open research infrastructure and capabilities vary greatly in terms of maturity. Most research institutions have well-developed systems to support open access to publications. Some institutions have well-developed research data systems that link to research data management plans, while few institutions have systems to support the capture and re-use of research code.

This section of the Toolkit focuses on the principles, practices, procedures and guidelines that are common to mature infrastructure.
Wherever possible, infrastructure development should be informed by the principles of FAIR (Findable, Accessible, Interoperable, Reusable) research outputs. The Council of Australian University Librarians’ (CAUL) Review of Australian Repository Infrastructure (2019) highlighted discoverability and interoperability as the most important considerations for future repository infrastructure in Australia. Interoperability is dependent on the adoption of standards and best practices.

“Different repositories are adapted to the specificity of the objects they contain (publications, data or code), to local circumstances, user needs and the requirements of research communities, yet should adopt interoperable standards and best practices to ensure the content in repositories is appropriately vetted, discoverable and reusable by humans and machines.” (Draft text of the UNESCO Recommendation on Open Science).

A research output is an outcome of research. It can take many forms, including journal articles, books, reports or creative works. Repositories are important infrastructure for open research because they facilitate open sharing of research outputs.

In Australia, all universities have an institutional repository to store and provide access to the institution’s research outputs. All Australian institutional repositories accept peer reviewed Author Accepted Manuscript (AAM) versions of journal articles, also known as post-prints, as well as other types of outputs. Other types of work housed in institutional repositories can include conference papers, chapters, working papers, reports and other non-traditional research outputs such as creative works. Pre-prints (manuscripts prior to peer review) are not commonly accepted into institutional repositories and are more likely to be deposited in dedicated preprint repositories such as ArXiv, Europe PMC or OSF Preprints.

While institutions may also provide infrastructure for storing data, in Australia, data is not commonly stored in the institutional repository as these tend to focus on the outcomes of research - that is, research outputs.

Plan S

CAUL recommends that Australian universities support institutional repositories to fulfil Plan S technical and service requirements. Plan S is an initiative for open access publishing, supported by cOAlition S, an international consortium of research funders. Plan S requires that, from 2020, scientific publications that result from research funded by public grants must be published in compliant Open Access journals or platforms.

Plan S includes five mandatory requirements for open access repositories:

use of permanent identifiers (PIDs)
use of standard, interoperable, non-proprietary metadata (including information on funding provided by cOAlition S funders).
use of machine-readable information on open access status and licensing
provision of standards for uptime
provision of assistance for users.

Plan S also includes seven strongly recommended additional criteria, and provides practical guidance on implementation, on the Coalition-S website.

Resources

OpenDOAR
Directory of Open Access Repositories (OpenDOAR)
One of the mandatory Plan S requirements for repositories is registration with the Directory of Open Access Repositories (OpenDOAR). OpenDOAR is an authoritative directory of open access repositories across the world, including Australia. It contains information on the platform used and types of content included in each repository.
Institutional Repository Software Comparison
United Nations Educational, Scientific and Cultural Organization (UNESCO)
For decisions relating to the implementation of specific repository platforms, the comparison of repository infrastructure undertaken by UNESCO in 2014 may also be useful. In this comparison, four open source systems and one proprietary system were compared against key functionality and usability.
CAUL review of Australian repository infrastructure
Council of Australian University Librarians
The CAUL Review of Australian Repository Infrastructure provides an overview of repository software currently in use in Australia and New Zealand.

Other Considerations

Additionally, repository managers may consider:

Use of identifiers for people associated with the item (ORCiD), research activity or projects (RAiD), and related publications.
How much involvement the repository team has in the submission process. Models include a mediated process for metadata control and copyright checking or a model where the researcher submits directly without mediation.
Integrations with other research infrastructure to increase linked open scholarship and aid in discovery and interoperability.
Compliance with Open Archives Initiative (OAI) software to ensure interoperability with other repositories. This may include supporting a protocol for integrations such as OAI-PMH or an API feed.
Open source systems that can incorporate several university systems or processes. Some proprietary systems integrate the research publications system, grants systems and repository in one tool.
Licensing information for use and reuse of metadata and data using Creative Commons licencing or appropriate software licencing.

While these considerations, as well as the Plan S principles, are specific to repositories for research outputs, they are also helpful principles to consider in the context of other types of repositories too, such as open data platforms. Effective repository infrastructure needs clear and robust procedures to guide researchers through the deposit process and their obligations.

Example Procedures

Open Access for QUT Research Outputs (including theses)
Queensland University of Technology
QUT’s policy is aligned with current ARC and NHMRC open access policies. It requires researchers to retain the right to disseminate a full text version of peer reviewed articles no later than 12 months from the date of first publication. The author's accepted manuscript (AAM) version (except where the published version is open access) is required to be deposited as soon as a DOI has been issued and the maximum embargo period is 12 months. Information concerning copyright and creative commons licensing is included.
Procedure: Open Access Research
Australian National University
This procedure supports the University’s open access policy and explains who can deposit, what outputs should be deposited, and how to deposit in the institutional repository. It also links to a related Open Data Procedure.

Institutional, discipline-based or national research data infrastructure is becoming the norm, replacing data systems at the local school and centre levels. Common practice is to link institutional storage to institutional data management plans or grant management metadata. Institutional repositories may include metadata records and the data itself, or the metadata only. Larger datasets or some data types may not be suitable for an institutional repository and may be linked to via commercial platforms such as figshare, community-built platforms such as Dryad, national services such as the NCI Data Catalogue or discipline-based repositories.

The Generalist Repository Comparison Chart is a tool researchers can use when deciding where to store and share their FAIR data outside of their institutional repository. Dataverse has also published a comparative review of eight data repositories. Considerations include size limits, persistent identifiers, file handling, metadata, harvesting, metrics, customisation and linking data to related digital information such as publications and grants.

Research Data Australia (RDA) is a searchable portal which brings together research data collections from projects and institutions across Australia. Data owners or managers can create metadata records in their own repositories which are then published in RDA. The data itself is stored and published with the source institution. The Australian Research Data Commons (ARDC) also runs a program of investment in research data platforms to support analysis, reproducibility, storage and sharing.

Research code platforms and associated practice are emerging areas of interest in the open research space. For example, in Australia, among the Group of Eight universities and CSIRO, almost all (seven) have a ‘software’ resource/item type in their research outputs or research data repository. Across these institutions, there is a mix of metadata-only records linking to a GitHub repository or similar containing code for the described software, repository records hosting the files (or a zip/archive) of a version of the software, and records both hosting and linking to code.

Of the same research organisations, in June 2021:

Six had an organisation/grouping of code repositories in GitHub, Bitbucket or GitLab for the research organisation
Seven had one or more organisations in GitHub, Bitbucket or GitLab for an individual school, research centre or organisational subunit
Five had an enterprise/organisational instance of GitHub Enterprise, Bitbucket or GitLab.

The UNESCO Recommendation on Open Science (2021) includes (open-) source code or software as a component of the research process and stipulates that code and software should be retained and shared. Enabling the reuse and replication of code generally requires that it be accompanied by the open data and open specifications of the environment required to compile and run it.

FORCE11 Software Citation Working Group recommends a persistent identifier for research software should resolve to a landing/metadata page rather than direct to a source code repository in a service such as GitHub – the latter does not guarantee persistence (repositories can be deleted or removed from public accessibility) and a specific point in the commit history may not be preferred or representative of the software as a research output.

Useful Resources

Software Deposit: Guidance for Researchers
Jisc and Software Sustainability Institute
This 'How/What/Where' guide offers advice on depositing/publishing software as a research output.

Mature infrastructure is underpinned by robust policy. The policy section of this Toolkit provides information about effective policies for open access storage and sharing of publications, data and code.

Example Policies

It should be noted that policy concerning the sharing of software and code is an emerging space in the Australian context. Statements concerning software and code are starting to appear in overarching open research policies rather than within a specific policy of their own (see the below example from the University of Melbourne).

Overseas examples include some strong statements about leveraging and contributing to Open Software where possible in order to support open scholarship

Principles for Open Access to Research Outputs
University of Melbourne
Statements regarding code can be found within the responsibilities outlined in this policy document.
Open Source Software Policy
University of North Texas
This policy document includes strong statements on the use and creation of Open Soure Software.
Data, software and materials management and sharing policy
The Wellcome Trust
This policy requires as a minimum that any original software that is required to view datasets or to replicate analyses is open access at time of publication.

All repository metadata - whether for outputs, data or code - should be open and accessible. There is no definitive metadata schema recommended for use within institutional research output repositories, although Dublin Core is widely used. There are however metadata standards for specific subjects or discipline publication and data repositories.

Key metadata standards include:

Dublin Core Metadata Initiative
Dublin Core
Schema consisting of 15 core elements to describe resources and designed for interoperability.
RDA Metadata Standards Directory
The RDA Metadata Standards Directory Working Group
For discipline and subject repositories lists, these identified community standards ensure the collection of consistent, reproducible metadata.
Open Archives Initiative Protocol for Metadata Harvesting
OpenArchives.org
(OAI-PMH) is a schema for repository interoperability. It supports the dissemination of records in multiple metadata formats from a repository.
FAIRsFAIR Data Object Assessment Metrics
FAIRsFAIR
For research data, the FAIRsFAIR Data Object Assessment Metrics lists seventeen metrics to measure the extent to which data objects meet the FAIR principles.
The CodeMeta Project
The CodeMeta Project
For research software or code, the CodeMeta Project provides a schema or set of terms for software metadata that integrates with various software package management services, repositories and archives.
Citation File Format
Citation File Format
A lightweight human- and machine-readable format for citation information for software, intended to be bundled with the software such as by including it in a code repository.
Registry Interchange Format: Collections and Services (RIF-CS)
Australian Research Data Commons
A data interchange format that supports the electronic exchange of collection and service descriptions. It is the metadata format required for data to be harvested automatically for display and discovery in Research Data Australia.

Infrastructure for Research Outputs

These exemplars were selected for their compliance with the FAIR principles for open research.

Open Research
Australian National University
This open source repository includes ORCiD, bibliometric and altmetric data, and licensing information for outputs.
QUT ePrints
Queensland University of Technology
An open source and extensive eprints repository. It includes ‘Impact and Interest’ statistics for each output including citations, downloads, usage and demographics of users.
Southern Cross Research Portal
Southern Cross University
This repository is linked to the university’s research publications and profiles system and uses automated harvesting.
USNSWorks
University of New South Wales
Integration with the research publications system streamlines the process for uploading publications to the repository. Links can be created between publications and associated grants to automatically send metadata about the publication to the repository to meet the first part of the ARC and NHMRC open access mandate requirements
Research UNE
University of New England
An example of in-house system integrations and IT support to make the repository FAIRer. Tools built in-house allow ORCID integration, altmetric display and Google Scholar linkages.

Additional Examples

For other examples of repositories in Australia and New Zealand, see Open Access Australasia’s Directory of Open Repositories.

Infrastructure for Research Data

CSIRO Data Access Portal
CSIRO
The domain search facility supports searching for individual files as well as for collections with multiple files that relate to specific scientific research areas (domains).
NCI Data Catalogue
National Computational Infrastructure (NCI)
NCI’s data services and data catalogue allow users, data portals and external science cloud environments to access, interact with and extract value from our data collections. The linking of large data sets with computational resources is a critical capability.
Research Data Manager
University of Queensland
UQ provides an integrated data management system covering the entire research data lifecycle with provisioning of accessible, secure and sharable data storage and publication of datasets into UQ eSpace.

Research Data Management Practice

UNSW Data Archive
University of New South Wales
UNSW requires Research Data Management Plans (RDMPs) to be created when a grant is awarded and for new HDR students before storage is allocated in the Data Archive, which is only available to UNSW researchers.
Data Ready
La Trobe University
La Trobe’s University’s Research Data Management Policy requires data plans to be created for all research projects. Data collection, storage, retention, accessibility and disposal are aligned with FAIR data principles as well as ARC and NHMRC open access policies. This guide provides an overview of key topics related to data management.

Research Code Platforms

Collection of research software in CSIRO Data Access Portal
CSIRO
Most collections have the files (or a zip/archive) for a release/version of the software while also linking to the associated code repository on GitHub or similar.
Zenodo + GitHub
GitHub
An integration that publishes a release/package from a GitHub repository to Zenodo, including generating a descriptive metadata record and a DataCite DOI.
Astrophysics Source Code Library
ASCL.net
Example of a discipline-centric repository indexing research software outputs, including the source code repository or other online location for the software itself, code papers/journal articles describing the software, and a unique identifier allowing the software to be consistently cited and represented in other repositories.

Next: How to be an Open Researcher

Except where otherwise noted, all content on the Open Research Toolkit is licensed under a Creative Commons Attribution International 4.0 (CC BY 4.0) licence. Under the licence conditions, please attribute Open Research Toolkit.

Open Research Toolkit

Infrastructure

Infrastructure for Research Outputs

Plan S

Resources

Other Considerations

Example Procedures

Infrastructure for Data

Infrastructure for Code

Useful Resources

Policy

Example Policies

Metadata

Featured Examples

Infrastructure for Research Outputs

Additional Examples

Infrastructure for Research Data

Research Data Management Practice

Research Code Platforms