RECONSTRUCTION OF PERTURBED DATA USING K-MEANS
Prasannta Tiwari*1 and Hitesh Gupta2
|Corresponding author: Prasannta Tiwari, E-mail: [email protected]|
|Related article at Pubmed, Scholar Google|
A key element in preserving privacy and confidentiality of sensitive data is the ability to evaluate the extent of all potential disclosure for such data. In other words, we need to be able to answer to what extent confidential information in a perturbed database can be compromised by attackers or snoopers. Several randomized techniques have been proposed for privacy preserving data mining of continuous data. These approaches generally attempt to hide the sensitive data by randomly modifying the data values using some additive noise and aim to reconstruct the original distribution closely at an aggregate level. The main contribution of this paper lies in the algorithm to accurately reconstruct the community joint density given the perturbed multidimensional stream data information. Any statistical question about the community can be answered using the reconstructed joint density. There have been many efforts on the community distribution reconstruction. Our research objective is to determine whether the distributions of the original and recovered data are close enough to each other despite the nature of the noise applied. We are considering an ensemble clustering method to reconstruct the initial data distribution. As the tool for the algorithm implementations we chose the “language of choice in industrial world” – MATLAB.