Infrastructure for open access storing and sharing includes systems that house and facilitate access to publications, data (including both processed and raw data as well as associated metadata), software and code. These are commonly referred to as repositories.
Across Australian institutions, open research infrastructure and capabilities vary greatly in terms of maturity. Most research institutions have well-developed systems to support open access to publications. Some institutions have well-developed research data systems that link to research data management plans, while few institutions have systems to support the capture and re-use of research code.
This section of the Toolkit focuses on the principles, practices, procedures and guidelines that are common to mature infrastructure.
Wherever possible, infrastructure development should be informed by the principles of FAIR (Findable, Accessible, Interoperable, Reusable) research outputs. The Council of Australian University Librarians’ (CAUL) Review of Australian Repository Infrastructure (2019) highlighted discoverability and interoperability as the most important considerations for future repository infrastructure in Australia. Interoperability is dependent on the adoption of standards and best practices.
“Different repositories are adapted to the specificity of the objects they contain (publications, data or code), to local circumstances, user needs and the requirements of research communities, yet should adopt interoperable standards and best practices to ensure the content in repositories is appropriately vetted, discoverable and reusable by humans and machines.” (Draft text of the UNESCO Recommendation on Open Science).
A research output is an outcome of research. It can take many forms, including journal articles, books, reports or creative works. Repositories are important infrastructure for open research because they facilitate open sharing of research outputs.
In Australia, all universities have an institutional repository to store and provide access to the institution’s research outputs. All Australian institutional repositories accept peer reviewed Author Accepted Manuscript (AAM) versions of journal articles, also known as post-prints, as well as other types of outputs. Other types of work housed in institutional repositories can include conference papers, chapters, working papers, reports and other non-traditional research outputs such as creative works. Pre-prints (manuscripts prior to peer review) are not commonly accepted into institutional repositories and are more likely to be deposited in dedicated preprint repositories such as ArXiv, Europe PMC or OSF Preprints.
While institutions may also provide infrastructure for storing data, in Australia, data is not commonly stored in the institutional repository as these tend to focus on the outcomes of research - that is, research outputs.
CAUL recommends that Australian universities support institutional repositories to fulfil Plan S technical and service requirements. Plan S is an initiative for open access publishing, supported by cOAlition S, an international consortium of research funders. Plan S requires that, from 2020, scientific publications that result from research funded by public grants must be published in compliant Open Access journals or platforms.
Plan S includes five mandatory requirements for open access repositories:
use of permanent identifiers (PIDs)
use of standard, interoperable, non-proprietary metadata (including information on funding provided by cOAlition S funders).
use of machine-readable information on open access status and licensing
provision of standards for uptime
provision of assistance for users.
Plan S also includes seven strongly recommended additional criteria, and provides practical guidance on implementation, on the Coalition-S website.
Additionally, repository managers may consider:
Use of identifiers for people associated with the item (ORCiD), research activity or projects (RAiD), and related publications.
How much involvement the repository team has in the submission process. Models include a mediated process for metadata control and copyright checking or a model where the researcher submits directly without mediation.
Integrations with other research infrastructure to increase linked open scholarship and aid in discovery and interoperability.
Compliance with Open Archives Initiative (OAI) software to ensure interoperability with other repositories. This may include supporting a protocol for integrations such as OAI-PMH or an API feed.
Open source systems that can incorporate several university systems or processes. Some proprietary systems integrate the research publications system, grants systems and repository in one tool.
While these considerations, as well as the Plan S principles, are specific to repositories for research outputs, they are also helpful principles to consider in the context of other types of repositories too, such as open data platforms. Effective repository infrastructure needs clear and robust procedures to guide researchers through the deposit process and their obligations.
Institutional, discipline-based or national research data infrastructure is becoming the norm, replacing data systems at the local school and centre levels. Common practice is to link institutional storage to institutional data management plans or grant management metadata.
Institutional, discipline-based or national research data infrastructure is becoming the norm, replacing data systems at the local school and centre levels. Common practice is to link institutional storage to institutional data management plans or grant management metadata. Institutional repositories may include metadata records and the data itself, or the metadata only. Larger datasets or some data types may not be suitable for an institutional repository and may be linked to via commercial platforms such as figshare, community-built platforms such as Dryad, national services such as the NCI Data Catalogue or discipline-based repositories.
The Generalist Repository Comparison Chart is a tool researchers can use when deciding where to store and share their FAIR data outside of their institutional repository. Dataverse has also published a comparative review of eight data repositories. Considerations include size limits, persistent identifiers, file handling, metadata, harvesting, metrics, customisation and linking data to related digital information such as publications and grants.
Research Data Australia (RDA) is a searchable portal which brings together research data collections from projects and institutions across Australia. Data owners or managers can create metadata records in their own repositories which are then published in RDA. The data itself is stored and published with the source institution. The Australian Research Data Commons (ARDC) also runs a program of investment in research data platforms to support analysis, reproducibility, storage and sharing.
Research code platforms and associated practice are emerging areas of interest in the open research space. For example, in Australia, among the Group of Eight universities and CSIRO, almost all (seven) have a ‘software’ resource/item type in their research outputs or research data repository. Across these institutions, there is a mix of metadata-only records linking to a GitHub repository or similar containing code for the described software, repository records hosting the files (or a zip/archive) of a version of the software, and records both hosting and linking to code.
Of the same research organisations, in June 2021:
Six had an organisation/grouping of code repositories in GitHub, Bitbucket or GitLab for the research organisation
Seven had one or more organisations in GitHub, Bitbucket or GitLab for an individual school, research centre or organisational subunit
Five had an enterprise/organisational instance of GitHub Enterprise, Bitbucket or GitLab.
The UNESCO Recommendation on Open Science (2021) includes (open-) source code or software as a component of the research process and stipulates that code and software should be retained and shared. Enabling the reuse and replication of code generally requires that it be accompanied by the open data and open specifications of the environment required to compile and run it.
FORCE11 Software Citation Working Group recommends a persistent identifier for research software should resolve to a landing/metadata page rather than direct to a source code repository in a service such as GitHub – the latter does not guarantee persistence (repositories can be deleted or removed from public accessibility) and a specific point in the commit history may not be preferred or representative of the software as a research output.
Mature infrastructure is underpinned by robust policy. The policy section of this Toolkit provides information about effective policies for open access storage and sharing of publications, data and code.
It should be noted that policy concerning the sharing of software and code is an emerging space in the Australian context. Statements concerning software and code are starting to appear in overarching open research policies rather than within a specific policy of their own (see the below example from the University of Melbourne).
Overseas examples include some strong statements about leveraging and contributing to Open Software where possible in order to support open scholarship
All repository metadata - whether for outputs, data or code - should be open and accessible. There is no definitive metadata schema recommended for use within institutional research output repositories, although Dublin Core is widely used. There are however metadata standards for specific subjects or discipline publication and data repositories.
Key metadata standards include:
These exemplars were selected for their compliance with the FAIR principles for open research.
For other examples of repositories in Australia and New Zealand, see Open Access Australasia’s Directory of Open Repositories.
Except where otherwise noted, all content on the Open Research Toolkit is licensed under a Creative Commons Attribution International 4.0 (CC BY 4.0) licence. Under the licence conditions, please attribute Open Research Toolkit.