A range of methods can be applied to personal information to make it safe for reuse.
When using personal information for research, evaluation and other such activities, it’s important to reduce the risk that people, households, and organisations are identified without their permission.
There are different methods to make personal information safe for such reuse. Each method offers a different level of reducing the risk of re-identification.
An agency’s data does not exist in a vacuum and it may be possible to identify an individual by combining an agency’s data that has used 1 of the following methods with data available elsewhere.
AboutMyInfo — a Harvard University project — shows how easy or hard it can be to identify someone based on only on their birthdate and zip code. While using the tool requires a US zip code, there are 4 illustrative samples provided.
How personal information can be made safe for reuse
The following methods can be used to reduce the amount of identifiable personal information contained in the individual client data that’s collected and used as part of providing public services. Other terms used include raw data, microdata, and transactional data.
Methods to use
The methods below are listed from least likelihood to greatest likelihood.
Confidentialisation
The statistical methods used to protect against confidential information being disclosed to people who are not authorised to have access to it, in a way that could identify an individual, household or organisation.
The statistical methods used provide a level of protection against identification that cannot be obtained from de-identification.
Aggregation
Data combined from several measurements but without the additional use of statistical methods to protect against re-identification.
De-identification
The process of removing information from microdata to reduce risk of spontaneous recognition. It typically includes removing names, exact dates of birth or death, and exact addresses.
Anonymisation — A term most commonly used to refer to data from which direct identifiers have been removed (de-identified data) but is sometimes used to refer to confidentialised data. Due to this confusion it’s not used in this diagram.
Pseudonymisation
The process of replacing direct identifiers with different ones in microdata to reduce risk of spontaneous recognition.