Measuring Local Topological Anonymity in Social Networks (2013.06.03.)

User Tracking on the Web via Cross-Browser Fingerprinting (2012.02.20.)

StegoWeb: Towards the Ideal Private Web Content Publishing Tool (2011.09.05.)

» Go to the full list

Recent comments

No new comments.

View study

Measuring Local Topological Anonymity in Social Networks

Back to studies


Social network-based services play an essential part in the everyday life of many. One common feature of these services is that they all have an underlying graph structure, where nodes and edges can have different interpretations depending on the type of the service. For instance, in case of an online social networking service, a user can create a profile for herself (i.e., a node), and mark other users as her acquaintances (i.e., creating edges). Some services do not reveal directly the graph structure beneath them, though they should be considered social networks similarly, as in the case of mobile phone use or e-mail correspondence.

These services may be valuable sources of information for several related parties, such as application developers, researchers, or business partners. Therefore, if the graph structure of such a service is released in anonymized form but with additional private information, it might be in the interest of a related party (who is now considered as an attacker since she violates user privacy) to re-identify users in the anonymized data set. Additionally, users may create instances in numerous services in parallel, and the attacker may try to link related instances. In both cases, the attacker builds the de-anonymization attack on his a priori knowledge (or auxiliary information), which can be some properties of the targeted users or even partial crawls of networks.

However, de-anonymization is not a trivial task, as user instances could be published with differing or contradictory profile information under unrelated identifiers. In order to avoid false identification, it has been shown that structural properties can be used as alternative source of correlation [1-6], which methods are called structural re-identification or de-anonymization attacks. By considering the extent of graph structure modification prior to sanitization, re-identification attacks can be categorized as active [1] or passive [2-6]. In active attacks, the malicious party modifies the network structure to find the specific injected subgraph (and the nodes it is connected to), while passive attacks use auxiliary information solely. For structural passive re-identification attacks, auxiliary information can be degree values of the node’s neighborhood [3], the structure of the neighborhood [4], or another graph that overlaps with the sanitized one [5, 6].

Concerning the strategy of the attack, re-identification can be carried out in two sequential phases: the global and the local re-identification phases (also called seed identification and propagation, respectively in [5, 6]). In the first phase, nodes (or subgraphs) with globally unique structural properties are re-identified, and after the appropriate number of nodes are de-anonymized, the second phase can be started. In this phase, nodes connected to already re-identified nodes are being analyzed, and the ones with locally outstanding structural properties are re-identified.

For example, an attacker tries to de-anonymize users based on their degree values in a sanitized graph, assuming her inputs are as it is depicted on Fig. 1. Regarding global re-identification, in the sanitized graph nodes form anonymity sets {4, 5, 8}, {6, 1}, {7, 2}, {3}, so the attacker who is looking for Harry, cannot re-identify him directly from {6, 1}. Therefore, she starts with global re-identification, and de-anonymizes node 3 as Dave. Then she proceeds with local re-identification, as she knows that Harry is connected to Dave and his degree value is 2, therefore she re-identifies node 1 as Harry, from the candidate set of {1, 4, 5, 7, 8}, i.e., local nodes around re-identified node 3.

Figure 1. The perturbed, anonymized graph to be de-anonymized (right) and the attacker knowledge crawled from another service (left).

Most structural de-anonymization techniques only implement the global re-identification phase [1-4], limiting their success rate to small scales, as it is not computationally feasible to de-anonymize hundreds of thousands of nodes this way. However, it has been shown that adding the second phase opens the way to efficient large-scale re-identification [5, 6]. Similarly as in the previous example, these attacks use two overlapping graphs as their input, a sanitized graph as their target (i.e., the target graph), and an auxiliary graph containing identifying information (i.e., the source graph), and they flag a target node as re-identified if it is assigned to a source node. 

In this paper, our focus is on the second, local re-identification phase, and as our main contribution, we introduce a novel anonymity measure called Local Topological Anonymity (LTA) for this phase, expressing the node’s resistance level against local re-identification techniques. The LTA measure is beneficial for users to apply privacy-enhancing software to establish their privacy status more accurately than simply considering their global structural re-identifiability (which app can also suggest making modifications in their relationships to achieve stronger privacy); by measuring their level of discoverability against the state-of-the-art structural re-identification attacks, i.e., ones implementing a local re-identification phase. In addition, the LTA gives a fast to calculate a posteriori anonymity measure, and it requires inputs that can be reasonably assumed to be available for users in most services: the neighbors and neighbors of neighbors of nodes.

Besides, the LTA measure also allows data providers and attackers to estimate the possible success of attacks; e.g., in deciding whether a dataset (a network graph) is ready for release or needs modification (e.g., low median LTA value), or worth attacking. In case of particular nodes that cannot be identified globally, LTA values also offer the possibility to check prior the release or the attack. Additionally, an attacker can calculate the LTA values of multiple users to locate anonymity and similarity sets, and check whether certain users are part of them or not. In this paper we focus on the anonymity measure of nodes, and leave the detailed analysis of measures calculating an LTA value of the entire network for future work.

The paper is structured as follows. In Section II, we discuss the related literature of structural anonymity and re-identification. In Section III, we introduce the concept of LTA, discuss the related theoretical foundations, and propose three variants of LTA measures. We present the evaluation of the measure afterwards in Section IV, first by visually comparing the results of the measures given in small networks, and then by simulation in larger networks. Finally, in Section V, we conclude our work and discuss issues for future work.

Read the rest of the paper

Download article from here or below. The original publication is available at

Source: Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on. IEEE, 2012.



Back to studies



No comments.

Post new comment

Anyone can comment, in case of unregistered senders all fields are optional. Comment can be anonymous.

Confirmation code. (Generate new confirmation code.)

BBCode is a simple markup language used for formatting comments. Valid codes are:

bold: [b]Maecenas at nisl.[/b]
italics: [i]Maecenas at nisl.[/i]
underline: [u]Maecenas at nisl.[/u]
url: [url][/url], [url=]Maecenas at nisl.[/url]
Picture: [img][/img]
quote: [quote]Maecenas at nisl.[/quote]
code: [code]Maecenas at nisl.[/code]
size: [size=12]Maecenas at nisl.[/size]
color: [color=#FF0000]Maecenas at nisl.[/color]


© International PET Portal, 2010 | Imprint | Terms of Use | Privacy Policy