Large datasets of personal health information (PHI) contain a wealth of knowledge that can be used to improve our understanding of disease progression, treatment efficacy, and how these relate to an individual patients’ genomics and experience of health services. The success of projects like Pambayesian is directly tied to their ability to access and derive this knowledge from PHI. The widespread use of anonymisation, enshrined in legislation in the US, Australia and the UK as a de-identification method, has drawn the attention of numerous researchers including Professor Latanya Sweeney, who has demonstrated how anonymisation represents a significant threat to patient privacy. This threat results from the ease of which re-identification can be achieved, often through simplistic methods such as google searches of media reports.

In 2014, two of our external collaborators, Dr Kuda Dube and Professor Thomas Gallagher, theorised that a data generation method may be able to produce synthetic datasets that are internally consistent with real PHI (RS-EHR), without the susceptibility to re-identification that anonymisation can bring. My own ongoing work built on their aproach and realised such a method, CoMSER, which was later expanded and incorporated into the Mitre Corporation’s Synthea project; a project which aims to create a safe and privacy-assured synthetic dataset for secondary research uses that includes birth-to-death records consistent with the disease and health interactions of Massachusetts’ 7 million citizens. More recently I have looked at how we can be assured that datasets of synthetic PHI are realistic (ATEN).

While debate rages in the literature* as to whether anonymisation can be made ‘safe’, projects like Pambayesian seek to limit the potential of PHI exposure. We take the ethical and privacy considerations of our work very seriously, and are constantly looking for ways to develop new Personalised Medicine approaches without risking the privacy of the hundreds or thousands of prior patients whose PHI provides the knowledge and lessons that we seek to use to improve future medical treatment approaches.

Last week, IEEE Future Directions published an editorial on Ethical Issues in Secondary Use of Personal Health Information authored by Drs Dube, Gallagher and myself. We believe that consideration of the wider privacy issues, and whether a research project’s secondary use of PHI could make use of a realistic synthetic electronic health record (RS-EHR) dataset should be a required component in any ethics application.

 

Gallagher, T., Dube, K., & McLachlan, S. (2018) Ethical issues in secondary use of personal health information. IEEE Future Directions, May 2018. Available here.

 

 

*Prof. Latoya Sweeney and her team consider that no anonymisation method is truly safe. While they have delivered several proposals for new methods, they always manage to find flaws in each. Former Sweeney grad student, Khaled El Emam has gone on to review many older and newer methods of anonymisation. And while he concluded that some are indeed susceptible, he has, possibly erroneously, concluded that reliance on HIPPA anonymisation methods are ‘safe’ – something more recent research (for example: Cheyenne Solomon of Indiana University and the work of Dr Susan Wallace) appear to disagree with.