There is no new content.

» Go to the full list

Recent comments

» 2018.02.03. 18:24:02, Note for Firefox @ Preventing misuses and misapprehensions of FireGloves

» 2017.03.12. 20:02:46, Namrata Nayak @ Predicting anonymity with machine learning in social networks

» 2017.01.13. 20:51:19, anonymous @ Preventing misuses and misapprehensions of FireGloves

» 2016.06.12. 13:52:44, Dany_HackerVille @ Preventing misuses and misapprehensions of FireGloves

» 2014.08.29. 17:16:15, [anonymous] @ Preventing misuses and misapprehensions of FireGloves

Use kmap to visualize anonymity/uniqueness in your data

| | 2016.06.01. 11:51:38  Gulyás Gábor  


While I was working on a paper recently, I was asking myself the question how to visualize the uniqueness (or anonymity set sizes) in the data. The only visualization that I am still aware of is Fig. 3 in the Panopticlick experiment, which shows anonymity set sizes created by each value of each attribute. This is it:

While this is a nice figure, it is quite hard to understand it quantitatively, and it can be even more complicated if you want to compare different datasets by using this visualization method. However, it would be nice to understand the state of uniqueness in datasets, especially if you consider different attributes in each case, apply anonymization or other countermeasures to decrease uniqueness.

This is why I started looking for another option, which finally lead to creating a simple, but heavily customizable plotting function I call kmap [code]. This tool can be used for multiple purposes, either if you are a data scientist experimenting or looking for a way that enables explanatory visualization to non-experts. It is useful to

  • visualize how different attributes partitionate your data into anonymity sets,
  • how different anonymization schemes or their setup affect uniqueness,
  • show how precise a certain fingerprinting method is.

Let's see a nice example based on UCI Adult Data Set. This tabular dataset contains attributes like age, sex or workclass of more than 30k adults. Let's pretend that we are considering releasing this dataset, and we would like to know how many (and which) attributes could be safely released. In order to get a better understanding of this, let's visualize the level of identification (uniqueness) if we release only 3, 6 or 9 attributes of each user. This looks like this with kmap:

3 attributes 6 attributes 9 attributes

It is quite easy to tell the differences by looking at the figures: releasing only 3 or 6 attributes is relatively safe (*) as less than 25% of the dataset can be uniquely identified. On the other hand, if 9 attributes are released, that would make almost 75% of users concerned by the release unique.

If you would like to try out kmap for your self, you can find the code and the files for the example in this git repository. Plus, our paper got accepted where this visualization was used, thus more useful examples can be expected.

I would like to hereby thank Gergely Acs, Claude Castelluccia, Amrit Kumar and Luca Melis for their comments while I was developing kmap.


(*) What is safe or not is another question; in some scenarios even having 6% of the users identifiable can be considered as a problem.

Tags: anonymity, visualization, plot, kmap, anonymity set, uniqueness





No comments.

Post new comment

Anyone can comment, in case of unregistered senders all fields are optional. Comment can be anonymous.

Confirmation code. (Generate new confirmation code.)

BBCode is a simple markup language used for formatting comments. Valid codes are:

bold: [b]Maecenas at nisl.[/b]
italics: [i]Maecenas at nisl.[/i]
underline: [u]Maecenas at nisl.[/u]
url: [url][/url], [url=]Maecenas at nisl.[/url]
image: [img][/img]
quote: [quote]Maecenas at nisl.[/quote]
code: [code]Maecenas at nisl.[/code]
size: [size=12]Maecenas at nisl.[/size]
color: [color=#FF0000]Maecenas at nisl.[/color]


© International PET Portal, 2010 | Imprint | Terms of Use | Privacy Policy