Step 9 - De-identify sensitive data

Image representing sensitive — Sensitive data introduces a risk of harm or unwanted attention

Data is considered sensitive when it can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention.

Categories of sensitive data include:

personal data
health and medical data
ecological data that may place vulnerable species at risk
culturally sensitive data

Separating or de-identifying your data usually occurs to protect an individuals privacy. According to the Commonwealth Privacy Act 1988, “personal information is de-identified if the information is no longer about an identifiable individual or an individual who is reasonably identifiable”.

De-identified information is no longer considered personal information and can be shared. More information on legal definitions and requirements on privacy can be found in the Commonwealth Privacy Act.

De-identifiying aims to allow data to be used by others for publishing, sharing and reuse without the possibility of individuals/location being re-identified. It may also be used to protect the location of archaeological findings, culturally sensitive data (for example archaelogical sites at risk of vandalism or looting) or the location of endangered species.

Any identifiers (name, date of birth, address or geospatial locations etc.) should be removed from main data set and replaced with a code/key. The code/key is then encrypted and stored separately. By storing de-identified data in a secure solution, you are meeting safety, controlled, ethical, privacy and funding agency requirements.

Re-identifing an individual can be possible by recombining the de-identifiable data set and the identifiers. There are processes that can be undertaken to mitigate this.

Australian practical guidance for de-identification

The ARDC’s Identifiable Data resource collates a selection of Australian and international practical guidelines and resources on how to de-identify datasets. In addition, their Publishing sensitive data guide is intended for researchers who own a data set and wish to share safely with fellow researchers or for publishing of data.
The Australian Government’s Office of the Australian Information Commissioner (OAIC) and CSIRO Data61 have released a ‘De-identification Decision Making Framework’ which is a “practical guide to de-identification, focussing on operational advice”. The guide will assist organisations that handle personal information to de-identify their data effectively.
The OAIC also provides high-level guidance on de-identification of data and information, outlining what de-identification is, and how it can be achieved.
The Australian Government’s guide to health privacy, includes techniques for making a data set non-identifiable and example case studies.
Office of the Information Commissioner Queensland has developed excellent guidance on Privacy and De-identified data.

Tips for managing de-identification

Plan de-identification early in the research as part of your data management planning
Retain original unedited versions of data for use within the research team and for preservation
Create a de-identification log of all replacements, aggregations or removals made
Store the log separately from the de-identified data files
Identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags e.g. .....

Management of identifiable data

Data may often need to be identifiable during the process of research. If data is identifiable then ethical and privacy requirements can be met through access control and data security. This may take the form of:

Control of access through physical or digital means (e.g. passwords)
Encryption of data, particularly if it is being moved between locations
Ensuring data is not stored in an identifiable and unencrypted format when on easily lost items such as USB keys, laptops and external hard drives.
Taking reasonable actions to prevent the inadvertent disclosure, release or loss of sensitive personal information. Source: ARDC

Five Safes framework: Working with sensitive data

The Five Safes framework is an approach to assessing and managing risks associated with sensitive data sharing and release. It has been adopted by Australia’s major statistics agencies including the Australian Bureau of Statistics and Australian Institute of Health and Welfare. Applying the framework to the data, its users and their purpose, storage and eventual research outcomes, enables researchers to access their large, linked datasets for valid research purposes.

Five Safe Framework has five dimensions with associated risks and management solutions:

safe people
safe projects
safe settings
safe data
safe outputs

Watch this short video from the UK Data Service on how the framework can be applied.

Home