7 results.
Emailing is now a part of our everyday life, and many people are available through email almost 0/24. However, what most of us don't consider is the privacy of emails, as we perceive emails traveling as regular closed envelopes traveling to the recipient. However, in fact, if we don't use PGP or other encryptions tools, our emails can be easily surveilled on their way.
This motivated the creation of TracEmail, a Thunderbird Addon that helps understanding the problem. TracEmail analyzes the source code of the email and makes an estimation on the path the email possibly have taken. Then it puts this path on an interactive map, where the data surveillance regulation of each country can also be inspected. The tool is now available in Thunderbird Addons (at the moment it is under review), so if you are using a Mac or Windows, you can just give it a try. If you are a Linux user, you can contribute to the source to make it available on Linux.
I've recently read an article where the author pictured a future where Google-glass-like products can support our decisions by using face recognition and similar techniques. While the author definitely aimed to picture a 'new bright future', she remained silent about potential abuses and privacy issues. Plus, while the technology has a definite direction toward as she desribed, this still leaves the writing as a piece of science fiction at the moment. But where exactly we are now, and for how long we may be reliefed?
Today, using machine learning (ML) is a hard task. First, you need to get vast amounts of quality data, then picking the proper algorithm, training and using it is also highly non-trivial. Not to mention hardware requirements, as the training requires a lot of computation power, and it takes a while until your application learns understanding the task it is designed to do. This might sound comforting from a privacy-focused aspect, but that would be inadequate to do so.
I see three major issues that could result a change in the state of the art, and I think – for some of these – we are already in the shifting phase:
It is easy to imagine that such ML could provide an exponential amount of privacy-infringing uses (*). However, we should not forget that today the data driven businesses fuel machine learning research and application development. Thus, there are already thousands of services that are built around data and machine learning. As many of these companies use data that was not gathered by user consent (just to mention at least one possible privacy violation), ML is already here to erode further our privacy.
Let's have some examples. BlueCava, a company that uses fingerprinting to track people on the web, is using machine learning to connect devices that belong to the same person. This is just an example; with little effort we could find a miriad of other companies who analyse user behavior, buying intent, fields of interests, etc. with similar techniques. Data that we generate is also at stake: we could think about smartphones and wareable devices, but also the posts we write.
To conclude shortly, machine learning already has a huge impact that should increase incredibly in the next few years. All big companies have their own research groups in the field, and if we are honest to ourselves, we know this is for a simple reason: use machine learning in their products in order to increase their revenues.
(*) I intentionally did not want to add a comment to if machines could became alive. I think here you can read a realistic opinion on the topic.
This post originally appeared in the professional blog of Gábor Gulyás.
Traditional privacy-enhancing technologies were born in a context where users were exposed to pervasive surveillance. The TOR browser could be thought as a nice textbook example: in a world where webizens are monitored and tracked by thousands of trackers (or a.k.a. web bugs), TOR aims to provide absolute anonymity to its users. However, these approaches beared two shortcoming right from the start. First, sometimes it would be acceptable to sacrifice a small piece of our privacy to support or use a service, second, as privacy offers freedom, it could also be abused (think of the 'dark web'). While there have been many proposals to remedy these issues, none in implementations were able to cumulate large user bases. In fact, in recent years privacy research quite rarely reached practical usability or even implementation phase. (Have you ever counted the number of services using differential privacy?)
Due to these reasons, it is nice to see that things are changing. A company called Neura made it to CES this year, who's goal is to provide a finer-grained and strict personal information sharing model, where the control stays in the hand of the users:
[...] firm has created smartphone software that sucks in data from many of the apps a person uses as well as their location data. [...] The screen he showed me displayed a week in the life of Neura employee Andrew - detailing all of his movements and activities via the myriad of devices - phones, tablets and activity trackers - that we all increasingly carry with us. [...] But the firm's ultimate goal is to offer its service to other apps, and act as a single secure channel for all of a user's personal data rather than having it handled by multiple parties, as is currently the case. [...] We are like PayPal for the internet of things. We facilitate transactions, and our currency is your digital identity.
I am a bit sceptic with this privacy selling approach: that much of data could give too much power for that company, and it is not clear what happens if the data is resold (which happens a lot today). It would be a bit more convincing if you could really own the data, and would have cryptograhpic guarantees for that. Until we have that I rather prefer technology where you could buy yout privacy back directly. Returning to the example of web tracking, there are interesting projects (like Google Contributor or Mozilla Subscribe2Web) that would allow to do micro payments to news sites instead of using being tracked and targeted with advertisements.
Another recent development, called PrivaTegrity, addresses accountability of abuses. The project is lead by David Chaum, who is the inventor of the MIX technology that is an underlying concept in digital privacy. While not all details are yet disclosed, it seems Chaum's team are working on a strong online anonymity solution that could be used for a variety of applications, would be fast and resource preserving (so it could work on mobile devices), and would have a controlled backdoor to discourage abusers. I am sure that this latter feature would initiate a large number of disputes, but Chaum claims that revoking anonymity would not remain in the hands of a single government; nine administrators from different countries would be required to reveal the real identity behind a transaction. Let's waint and see how things develop; however, this is definitely a challenging argument for those who vote on erasing privacy.
Here is their paper on the underlying technology.
This post originally appeared in the professional blog of Gábor Gulyás.
In a previous blog entry, I described how random forests could be used to predict the level of empirical identifiability. I have also been experimenting with neural networks, and how this approach could be used to solve the problem. As there is a miriad of great tutorials and ebooks on the topic, I'll just continue the previous post. Here, instead of using the scikit-learn package, I used the keras package for modeling artificial neural networks, which relies on theano. (Theano allows efficient execution by using GPU. Currently only NVIDIA CUDA is supported.)
The current setting is the same described in the previous post: node neighborhood-degree fingerprints are used to predict how likely it is that they would be re-identified by the state-of-the-art attack. As I've seen examples using raw image data for character classification (as for the MNIST dataset) with a Multi-Layer Perceptron structure, I decided to use a simple, fully connected MLP network, where the whole node-fingerprint is fed to the net. Thus the network is constructed of an input layer 251 neurons (with rectified linear unit activation, or relu in short), a hidden layer of 128 neurons (with relu). To achieve classification, I used 21 output neurons to cover all possible score cases in range of -10, ..., 10. Here, I used a softmax layer, as an output like a distribution is easier to handle for classification. See the image below for a visual recap.
I did all the traning and testing as last time: the perturbed Slashdot networks were used for training, and perturbations of the Epinions network were serving as test data. In each round with a different level of perturbation (i.e., different level of anonymization or strength of attacker background knowledge) I retrained the network with Stochastic Gradient Descent (SGD), using the dropout technique – you can find more of the details in the python code. As the figure shows below, this simple construction (hint: and also the first successful try) could beat previous results, however, with some constraints.
In the high recall region this simple MLP-based prediction approach proved to be better than all previous ones. While for the simulation of weak attackers (i.e., small recall, where perturbation overlaps are small), random forests obviously are the best choice. You can grab the new code here (you will also need the datasets from here).
This post originally appeared in the professional blog of Gábor Gulyás.
A full-time position for doctoral students has been announced by the University of Regensburg for the period 2014-2018 in the research area "Security and Privacy in Smart Environments". The project is run by FORSEC, a research consortium of German universities and research institutions. Although no deadline has been announced, the job already starts in the summer of 2014, so if you are interested, submit your application soon to heike.gorski@wiwi.uni-regensburg.de
We have discussed in our previous posts that companies track their users and the traffic on their website. Also there are companies offering solutions to track visitors. So the question is how much would you charge for a list of all the sites you visited in the past two weeks? At a first glance, this might seem a simple question. We have some surprising data for you!
In recent years the majority of websites adopted a business model in which you get a seemingly free service, but in exchange you give up your privacy. The model works simply: while you enjoy surfing freely, you are also being monitored and profiled in order to get advertisements and prices tailored to your interests. For example Orbitz steered Mac users to pricier hotels, this can also happen with you in other contexts, according how the advertisers estimate your affordability.
Auctions, where you are the product
When you open a website which has advertisements slots, there is the chance that your browsing history will be sold at an auction for advertisers, and your device will be involved in a real-time bidding (RTB) procedure. Do you remember our question in the previous section?
Have you considered your price?
Well, just to help positioning yourself, it is estimated that most of us would trade our privacy only in exchange of 7 EURs on average. Sounds nice, right? Unfortunately, this is just an unreal dream: our browsing history is being sold typicallyfor less than 0.0005 USD, as French researchers revealed in their recent study.
Who is making business and how?
When you open a website that has its incomes from advertisements (for instance nytimes.com), a slot on their site can invoke an auction. Next, an ad exchange (e.g., DoubleClick or Facebook) will offer bidders to propose a price for placing their advertisements. The ad exchange identifies you with a tracking cookie (on nytimes.com), and distributes your browsing history among bidders, who will then have a chance to merge it with what they already know about you (tracked with another cookie). Thereafter, bidders have all the information to consider a price tag for you, and the bidder offering the highest price gets the chance to display the actual advertisement. This is a well-designed system, right? Also note that even loosing parties get a copy of your browsing history.
Price-tag sensitivity
Olejnik created tools with his collaborators to detect RTB and analyze winner prices. It may be impossible to get a global overview, as in many cases the winner prices are encrypted. Their analysis is based on the rest. It turns out that different visitor properties steer prices significantly. Location is one of the strongest factors, e.g., a profile located in the US had a price of 0.00069 USD, much higher than others located in France (0.00036 USD) or in Japan (0.00024 USD). They also discovered, that profiles are worth more in the morning. For instance, in their investigation a US profile was worth 0.00075 USD in the morning and 0.00062 USD in the evening. Not surprisingly, browsing history also altered prices significantly. New profiles with no records are worth the least, while others with interesting history of visiting webshops (e.g., jewelry site) are worth more.
What can I do about this?
Using ad-blocks is only a partial solution. Use web bug killer instead. Web bugs are small programs advertisers use to detect user presence and to monitor activities. If you are a Firefox or a Ghostery user, you could use for instance Ghostery.
This post originally appeared in the Tresorit Blog.
Trilateral Research & Consulting, a London-based consultancy, specialising in research and the provision of strategic, policy and regulatory advice on new technologies is seeking to engage a Senior Research Analyst. The candidate will be expected to work on Trilateral projects in both the public and private sectors.
Key topics within these projects will include issues of privacy, trust, surveillance, risk and security as they pertain to cutting-edge innovative developments in ICTs and related technologies.
More about the position can be read here, or write to info@trilateralresearch.com