DIMACS Working Group on Privacy / Confidentiality of Health Data
December 10 - 12, 2003
DIMACS Center, CoRE Building, Rutgers University, Piscataway, NJ
- Organizers:
- Rakesh Agrawal, IBM Almaden, ragrawal@acm.org
- Larry Cox, CDC, lcox@cdc.gov
- Joe Fred Gonzalez, CDC, jfg2@cdc.gov, chair
- Harry Guess, University of North Carolina, harry_guess@unc.edu
- Tomas Sander, HP Labs, tosander@exch.hpl.hp.com
Presented under the auspices of the Special
Focus on Communication Security and Information Privacy and
Special Focus on Computational and Mathematical Epidemiology.
Subgroup meeting DIMACS Working Group on Data De-Identification, Combinatorial Optimization, Graph Theory, and the Stat/OR Interface.
Privacy concerns are a major stumbling block to public health
surveillance, in particular bioterrorism surveillance and
epidemiological research. Moreover, the Health Insurance Portability
and Accountability Act (HIPAA) of 2002 imposes very strict standards
for rendering health information not individually identifiable. One
approach involves removal of a number of potential identifiers
including all dates of health events. Causes must precede effects, so
removing temporal relationships in a dataset makes it all but
impossible to use the data for etiologic research or for studies of
medical care outcomes. Another approach to de-identification under
HIPAA is for an expert statistical opinion to be provided that the
risk of identifying an individual is very small. Accepted standards
for making such a determination have not been developed. How to use
large health care databases to detect medical or terrorist risks and
improve health care quality while maintaining privacy and
confidentiality of the data is a serious challenge. The problem is of
interest to government agencies at all levels of government,
industrial and academic researchers, as well as to a growing
commercial sector that collects, maintains, and markets such data
sets. Not many computer scientists knowledgeable about methods of
cryptography/security/privacy/cryptography have gotten involved in
this area (though some have), and the area is ripe for new
partnerships between those in the public health/epidemiology
community, the health data industry, and the computer science
community. This working group was motivated by work of the DIMACS
Working Group on Adverse Event/Disease Reporting, Surveillance, and
Analysis, which is part of the DIMACS Special Focus on Computational
and Mathematical Epidemiology. It will meet separately before the
first meeting of the Working Group on On-line Privacy: Threats and
Tools or in conjunction with it. The group will explore computational
techniques for ensuring that the identity of an individual contained
in a released data set cannot be identified. The challenge is to
produce anonymous data that is specific enough to be useful for
research and analysis. It will consider ways to remove direct
identifiers (social security number, name address, telephone number),
and ways to aggregate, substitute, and remove information from data
sets. Also of interest will be questions having to do with using
electronic data matching to link data elements from various
sources/data sets in order to identify individuals, while maintaining
privacy of others. The group will investigate methods for privacy
protection in field-structured data and ways to extend existing
methods to large data sets, as well as systems to render textual data
sufficiently anonymous. Finally, the group will explore formal
frameworks for disclosure control and formal protection models.
Next: Call for Participation
Working Group Index
DIMACS Homepage
Contacting the Center
Document last modified on November 18, 2003.