There is no new content.

» Go to the full list

Recent comments

» 2017.03.12. 20:02:46, Namrata Nayak @ Predicting anonymity with machine learning in social networks

» 2017.01.13. 20:51:19, anonymous @ Preventing misuses and misapprehensions of FireGloves

» 2016.06.12. 13:52:44, Dany_HackerVille @ Preventing misuses and misapprehensions of FireGloves

» 2014.08.29. 17:16:15, [anonymous] @ Preventing misuses and misapprehensions of FireGloves

» 2013.10.19. 12:05:04, [anonymous] @ Preventing misuses and misapprehensions of FireGloves


3 results.

On the impact of machine learning (on privacy)

| | 2016.02.23. 05:02:01  Gulyás Gábor  

I've recently read an article where the author pictured a future where Google-glass-like products can support our decisions by using face recognition and similar techniques. While the author definitely aimed to picture a 'new bright future', she remained silent about potential abuses and privacy issues. Plus, while the technology has a definite direction toward as she desribed, this still leaves the writing as a piece of science fiction at the moment. But where exactly we are now, and for how long we may be reliefed?

Today, using machine learning (ML) is a hard task. First, you need to get vast amounts of quality data, then picking the proper algorithm, training and using it is also highly non-trivial. Not to mention hardware requirements, as the training requires a lot of computation power, and it takes a while until your application learns understanding the task it is designed to do. This might sound comforting from a privacy-focused aspect, but that would be inadequate to do so.

I see three major issues that could result a change in the state of the art, and I think – for some of these – we are already in the shifting phase:

  1. Machine learning based applications should fit an average smartphone. Last year, we could see a nice pioneering example of real-time (pretrained) machine learning with Google translate: it could detect text from the camera in real time (text recognition), it could translate the given text (with deep neural networks), then it could replace the original text with its translation. This kind of applications should fit in low-end phones, too. This is likely to hapen in one, or two years.
  2. Currently programs are trained remotely due to resources issues. Training phase needs to be shifted to the consumer side, to be done on smart phones. In a couple of years we might have specialized chips in smartphones that enable this, opening up the way of new types of applications.
  3. Developing applications that use machine learning should be easier. There are a lot of research andeducational activities around machine learning nowadays, but we can't stay that machine learning could be a simple import-and-use tool in general in the future. For some specific tasks, data types it might be, but that's all what we can see now.

It is easy to imagine that such ML could provide an exponential amount of privacy-infringing uses (*). However, we should not forget that today the data driven businesses fuel machine learning research and application development. Thus, there are already thousands of services that are built around data and machine learning. As many of these companies use data that was not gathered by user consent (just to mention at least one possible privacy violation), ML is already here to erode further our privacy.

Let's have some examples. BlueCava, a company that uses fingerprinting to track people on the web, is using machine learning to connect devices that belong to the same person. This is just an example; with little effort we could find a miriad of other companies who analyse user behavior, buying intent, fields of interests, etc. with similar techniques. Data that we generate is also at stake: we could think about smartphones and wareable devices, but also the posts we write.

To conclude shortly, machine learning already has a huge impact that should increase incredibly in the next few years. All big companies have their own research groups in the field, and if we are honest to ourselves, we know this is for a simple reason: use machine learning in their products in order to increase their revenues.


(*) I intentionally did not want to add a comment to if machines could became alive. I think here you can read a realistic opinion on the topic.


This post originally appeared in the professional blog of Gábor Gulyás.

Tags: google, privacy, web privacy, google glass, data privacy, machine learning


0 comment(s).

Predicting anonymity with neural networks

| | 2015.10.16. 05:01:02  Gulyás Gábor  

In a previous blog entry, I described how random forests could be used to predict the level of empirical identifiability. I have also been experimenting with neural networks, and how this approach could be used to solve the problem. As there is a miriad of great tutorials and ebooks on the topic, I'll just continue the previous post. Here, instead of using the scikit-learn package, I used the keras package for modeling artificial neural networks, which relies on theano. (Theano allows efficient execution by using GPU. Currently only NVIDIA CUDA is supported.)

The current setting is the same described in the previous post: node neighborhood-degree fingerprints are used to predict how likely it is that they would be re-identified by the state-of-the-art attack. As I've seen examples using raw image data for character classification (as for the MNIST dataset) with a Multi-Layer Perceptron structure, I decided to use a simple, fully connected MLP network, where the whole node-fingerprint is fed to the net. Thus the network is constructed of an input layer 251 neurons (with rectified linear unit activation, or relu in short), a hidden layer of 128 neurons (with relu). To achieve classification, I used 21 output neurons to cover all possible score cases in range of -10, ..., 10. Here, I used a softmax layer, as an output like a distribution is easier to handle for classification. See the image below for a visual recap.

I did all the traning and testing as last time: the perturbed Slashdot networks were used for training, and perturbations of the Epinions network were serving as test data. In each round with a different level of perturbation (i.e., different level of anonymization or strength of attacker background knowledge) I retrained the network with Stochastic Gradient Descent (SGD), using the dropout technique – you can find more of the details in the python code. As the figure shows below, this simple construction (hint: and also the first successful try) could beat previous results, however, with some constraints.

In the high recall region this simple MLP-based prediction approach proved to be better than all previous ones. While for the simulation of weak attackers (i.e., small recall, where perturbation overlaps are small), random forests obviously are the best choice. You can grab the new code here (you will also need the datasets from here).


This post originally appeared in the professional blog of Gábor Gulyás.

Tags: privacy, anonymity, machine learning, anonymity measure, neural networks


0 comment(s).

Predicting anonymity with machine learning in social networks

| | 2015.08.18. 05:29:12  Gulyás Gábor  

Measuring the level of anonymity is not an easy task. It can be easier in some exceptional cases, but that is not true in general. For example in an anonymized database, we could measure the level of anonymity with the anonymity set sizes: how many user records share the same properties, which could make them identifiable. (And here is the point where differential privacy fans raise their voices, but that story is worth another post.) However, this is much harder if we think about highly dimensional datasets where you have hundreds of attributes for a single user (think of movie ratings, for example).


Tags: anonymity, machine learning, anonymity measure


Read more... (1 comment(s))

© International PET Portal, 2010 | Imprint | Terms of Use | Privacy Policy