近年来,隐私保护数据发布得到了研究者的广泛关注,聚类与隐藏原理上的差异使得面向聚类的隐藏成为难点.针对现有保距和保分布隐藏难以有效兼顾数据聚类可用性和隐私安全的不足,提出基于保邻域隐藏的扰动算法VecREP(vector equivalent replacing based perturbing method),通过分析数据点邻域组成结构,引入能够保持数据邻域组成稳定的安全邻域定义.进一步基于向量偏移与合成思想,提出有效保持邻域数据分布特征的等价置换弧.对任意数据点,采用随机选取位于其安全邻域内等价置换弧上点替换的策略实现隐藏.将算法与已有的RBT,TDR,Camp-crest 和NeNDS 算法进行实验比较,结果表明:VecREP 算法具有与保距隐藏算法RBT 相近的聚类可用性,优于其余算法,能够较好地维持数据聚类的可用性.同时,具有好于其余算法的数据隐私保护安全性.
Privacy-Preserving data publishing has attracted considerable research interest over the past few years. The principle difference of clustering and obfuscating burdens the trade-off between clustering utility maintaining and privacy protection. Most of existing methods such as adopting strategies of distance-preservation, or distribution-preservation, cannot accommodate both clustering utility and privacy security of the data. As a trade-off, a neighborhood-preservation based perturbing algorithm VecREP (vector equivalent replacing based perturbing method) is proposed, which realizes good clustering utility by maintaining the nearest neighborhood for each data point. The definition of a safe neighborhood is introduced to stabilize the composition of the nearest neighborhood. The equivalent replacing arc is generated to realize distribution stability of nearest neighborhood leveraging vector offset and composition. For each data point, VecREP randomly chooses a point on its equivalent replacing arc inside corresponding safe neighborhood to make substitution. The algorithm is compared with existing methods such as RBT, TDR, Camp-crest and NeNDS. Experimental results demonstrate that VecREP competes in performance with RBT on maintaining clustering quality and, outperforms the other. It can avoid a reversible attack effectively and compared to the existing solution, ARMM has a shorter handover delay and a smaller location update and delivery cost.