###
Journal of Software:2013.24(12):2883-2896

面向缺失数据的数据匿名方法
龚奇源,杨明,罗军舟
(东南大学 计算机科学与工程学院, 江苏 南京 211118)
Data Anonymization Approach for Incomplete Microdata
GONG Qi-Yuan,YANG Ming,LUO Jun-Zhou
(School of Computer Science and Engineering, Southeast University, Nanjing 211118, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 2290   Download 1930
Received:February 21, 2012    
> 中文摘要: 在数据发布过程中,为了防止隐私泄露,需要对数据的准标识符属性进行匿名化,以降低链接攻击风险,实现对数据所有者敏感属性的匿名保护.现有数据匿名方法都建立在数据无缺失的假设基础上,在数据存在缺失的情况下会直接丢弃相关的记录,造成了匿名化前后数据特性不一致.针对缺失数据匿名方法进行研究,基于k-匿名模型提出面向缺失数据的数据匿名方法KAIM(k-anonymity for incomplete mircrodata),在保留包含缺失记录的前提下,使在同一属性上缺失的记录尽量被分配到同一分组参与泛化.该方法将分组泛化前后的信息熵变化作为距离,基于改进的k-member 算法对数据进行聚类分组,最后通过基于泛化层次的局部泛化算法对组内数据进行泛化.实际数据集的大量实验结果表明,KAIM 造成信息缺损仅为现有算法的43.8%,可以最大程度地保障匿名化前后数据特性不变.
中文关键词: 数据匿名  缺失数据  聚类  k-匿名
Abstract:To protect privacy against linking attacks, quasi-identifier attributes of microdata should be anonymized in privacy preserving data publishing. Although lots of algorithms have been proposed in this area, few of them can handle incomplete microdata. Most existing algorithms simply delete records with missing values, causing large information loss. This paper proposes a novel data anonymization approach called KAIM (k-anonymity for incomplete microdata), for incomplete microdata based on k-member algorithm and information entropy distance. Instead of deleting any records, KAIM effectively clusters records with similar characteristics together to minimize information loss, and then generalizes all records with local recording scheme. Results of extensive experiments base on real dataset show that KAIM causes only 43.8% information loss compared with previous algorithms for incomplete microdata, validating that KAIM performs much better than existing algorithms on the utility of anonymized dataset.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61272054,61202449,61003257,61320106007);国家重点基础研究发展计划(973)(2010CB328104);国家高技术发展计划(863)(2013AA013503);国家科技支撑计划(2010BAI88B03,2011BAK21B02);高等学校博士学科点专项科研基金(20110092130002);江苏省网络与信息安全重点实验室(BM2003201);教育部网络与信息集成重点实验室(93K-9) 国家自然科学基金(61272054,61202449,61003257,61320106007);国家重点基础研究发展计划(973)(2010CB328104);国家高技术发展计划(863)(2013AA013503);国家科技支撑计划(2010BAI88B03,2011BAK21B02);高等学校博士学科点专项科研基金(20110092130002);江苏省网络与信息安全重点实验室(BM2003201);教育部网络与信息集成重点实验室(93K-9)
Foundation items:
Reference text:

龚奇源,杨明,罗军舟.面向缺失数据的数据匿名方法.软件学报,2013,24(12):2883-2896

GONG Qi-Yuan,YANG Ming,LUO Jun-Zhou.Data Anonymization Approach for Incomplete Microdata.Journal of Software,2013,24(12):2883-2896