If you work in an area that deals with data or managing data, then you may have heard the term data custodian bandied about. But do you know what it means? And is it really anything you need to concern yourself with?
So what is a data custodian and what do they do?
The term data custodian is relatively new and evolving. Adding to the complexity is that different organisations and different professional fields use the term differently. Much of this variability derives from the fact that the data custodian role originates in the fairly new discipline of data management which is itself fairly new and evolving.
To make our discussion a little easier we’ll narrow the scope to custodians of research data generated in the course of publicly funded research in Australia. In most cases this will mean research funded by the ARC or NHMRC.
A definition or understanding of “data” is reasonably straightforward. Most of us have at least a working idea of what data is.
For the purpose of this discussion let’s go with something like: a collection of information, facts, recorded observations and/or statistics collected together for analysis or planning.
According to Merriam Webster a “custodian” is “someone who keeps and protects something valuable for another person.”
So a data custodian, in the absence of a formal definition for whatever organisation or field you work in, could be defined as a person who keeps and protects data for another person. This definition is perfectly valid, but does not necessarily provide a great deal of clarity.
In order to provide some clarity we will look at some of the duties “keeping and protecting data” might entail. Again due to the evolving and variable role of the data custodian be very aware that these duties as representative not definitive.
Keep and protect data
Keep has a number of definitions with different senses, but in this context probably the to cause (someone or something) to continue in a specified state, condition, or position sense would make sense and be a good fit. So what sorts of things would need to be done to keep data? Here are some examples:
- maintain the data in a usable state, and ensure that sufficient copies of the data exist to survive a disaster or mishap
- maintain mandated records pertaining to the data. Specific examples include:
- maintain records pertaining to the provenance of the data, where it came from, how it was collected, how it has been processed, what the original purpose for collecting it was
- record the data’s location and any changes of location or moves
- generate records pertaining to the disposal of the data
- publish metadata suitable for discovery of the data by other interested parties
- maintain and periodically test access to the data
Section 2 of the Australian Code for the Responsible Conduct of Research recommends retention periods for primary materials and research data.
Protect means “ to keep (someone or something) from being harmed, lost, etc.” so in this context the things that the data custodian needs to protect are:
- the veracity of the data – make sure that the data is not modified except by authorised parties following appropriate protocols
- the integrity of the data – ensure that it is kept on an appropriate medium(s), under conditions that will best support the longevity of the data and that it can be restored in the event of a disaster or other event
- the privacy of any body who has contributed personal information (e.g. patients, survey recipients) to the data by ensuring it is accessed only by designated users for approved purposes
- the accessibility of the data against digital obsolescence and bit rot – i.e. the possibility that the hardware or software to read the data may not be available in the near future or random data errors might be introduced over time.
- legally protects the data. The data custodian will often be the legal agent or representative for the data, similarly to the way a parent is often the custodian of their minor children. In this context the data custodian might:
- ensure compliance with legislative and ethical requirements, such as privacy law, or ethics committee conditions
- retain the data for an appropriate period, in line with policy, guidelines and other regulations
- dispose of the data in an appropriate manner – when and if appropriate, in accordance with relevant policy and regulation
- approve and deconflict requests to access or use the data
- license the data for exploitation by other parties.
So in a nutshell the data custodian acts as the person who is responsible for the data, though the custodian does not normally own the data.
For someone. Who?
There are a number of parties that a data custodian keeps or protects data for. The most obvious one is the data owner. So who is the owner? Again it varies, but since we’ve narrowed the scope of our definition to the context of Australian publicly funded research there is an answer that will work in a majority of cases. The owner of intellectual property and materials (including data) produced by publicly funded research in Australia is usually the organisation administering the public research funds.
There are exceptions but normally the administering organisation will own the intellectual property and materials that are produced, and that includes data. Apart from the owner who else might a data custodian be working? Looking back at the keep portion of our discussion we mentioned about privacy and ethics.
So some of the people we might be protecting the data for are people who have provided information for inclusion in the data, e.g. patients whose data is being used or people who responded in surveys or interviews. They have a right to privacy and an expectation that their data will only be used in a manner consistent with what they allowed for in their release.
Additionally data may be sourced from private organisations such as companies, who might consider their data to be trade secrets, intellectual property or otherwise require protection from unauthorised use. The data custodian is the person who determines that any use of the data is appropriate and in compliance with any releases, licenses or other conditions of use, including privacy laws and ethics conditions. In doing so the data custodians protects the parties that contributed their information to the data.
What about the researcher? Presumably, the researcher has gathered this data to support a hypothesis, with an eye to eventually publishing a paper. They need to be able to show that their data is correct and above question, that the data has been carefully and responsibly collected, vetted, processed and protected from corruption.
In the event that someone does challenge their findings, the data needs to be available to resolve the challenge. Obviously then the researcher then is very definitely one of the people who benefits from the work of the data custodian (in some cases the researcher may be the data custodian). Finally, we are now entering an era of open access and data re-use. As such the wider community has an interest in the data because they probably funded its collection and analysis, and will probably fund the data’s storage or archiving.
- The ARC Open Access Policy defines government requirements for deposit of publications supported by ARC funding.into open access repositories.
- The ARCs Funding Agreements for projects beginning in 2014 state: “The Final Report must outline how data arising from the Project have been made publicly accessible where appropriate”
- And to further emphasise the weight the ARC places on open access of publications and data the Funding Agreements go on to say “If the ARC is not satisfied with the outcomes of the Project, this may be noted against any further proposals under any ARC scheme submitted on behalf of any Chief Investigator on the Project and may be taken into account in the assessment of those proposals.”
As such the wider community have a reasonable expectation that the data be used appropriately, well managed and if possible, used again – which brings us to… other researchers. Data used in one project may well have value outside that project, either as primary or secondary/supporting data for another project. The appropriate reuse of existing data has the potential to save researchers time and funds. However data can only be re-used if it is discoverable, if it’s provenance is understood, if it is of a usable quality and has been kept in a usable state. So a data custodian works for other or future researchers and the wider community too.
How is this important to me? Why should I care?
You hopefully have a working idea of what a data custodian is,the kinds of duties they might undertake and some of the parties “keep and protect” data for. But why should you care?
If you’re reading this article there is a good chance you are a researcher or someone who works with or for researchers.
The various governance instruments regulating Australian publicly funded research mandate and suggest a number of duties and roles around the management of data produced in that research. As such, institutions, organisational leaders and managers, Chief Investigators, Partner Investigators and admin and support staff will, throughout the progression of their careers, all likely have direct or indirect involvement with data custodians or be appointed as data custodians.
I hope this article has been useful in providing an overview of the role of data custodians and why we need them. It seems likely we will see and come to rely on data custodians more and more as we come to understand the value of our data collections.
Some places you that you could look for more specific information about data custodianship as it relates to you in your organisation are:
- Your institution’s policies – especially those surrounding the conduct or practice of research, intellectual property and information technology operations.
- Data Librarians
- Research Offices/Branches
- eResearch Organisations – both within your organisation and external
- IT Security Teams
- Compliance offices
- Your manager or head of school
For more general information about data management the Australian National Data Service is an excellent resource tailored to the needs of the Australian research community.
If you are employed by a South Australian University or SA Government Department as a researcher or work in a data custodian role in support of research data then eResearch South Australia may be able to assist you in managing the data you are responsible for in areas such as:
- providing advice about metadata and publishing metadata records
- providing secure storage and controlled access for data collections
- assisting in the movement of data within the state and nationally.