The company’s new TensorFlow Privacy module lets devs safeguard data with differential privacy
Google has announced a new module for its machine learning framework, TensorFlow, that lets developers improve the privacy of their AI models with just a few lines of extra code.
TensorFlow is one of the most popular tools for building machine learning applications, and it’s used by developers around the world to create programs like text, audio, and image recognition algorithms. With the introduction of TensorFlow Privacy, these developers will be able to safeguard users’ data with a statistical technique known as “differential privacy.”
Introducing this tool is in keeping with Google’s principles for responsible AI development, Google product manager Carey Radebaugh tells The Verge. “If we don’t get something like differential privacy into TensorFlow, then we just know it won’t be as easy for teams inside and outside of Google to make use of it,” says Radebaugh. “So for us it’s important to get it into TensorFlow, to open source it, and to start to create this community around it.”
The mechanics of differential privacy are somewhat complex, but it is essentially a mathematical approach that means AI models trained on user data can’t encode personally identifiable information. It’s a common way to safeguard the personal information needed to create AI models: Apple introduced it for its AI services with iOS 10, and Google uses it for a number of its own AI features like Gmail’s Smart Reply.
To understand the dangers to privacy posed by these sorts of services, consider how Smart Reply relies on data collected from more than a billion Gmail users to make its suggested replies. This data obviously includes extremely personal information (basically anything you’ve ever put in an email), and if Smart Reply surfaced this, by, for example, suggesting a reply to an email that is word-for-word what another user wrote, it would be disastrous.
Differential privacy eliminates this possibility with “mathematical certainty,” says Úlfar Erlingsson, a research scientist at Google who’s been working in the field of data privacy for 20 years. It’s a technique that removes identifiable outliers from datasets without changing the aggregate meaning of that data, Erlingsson tells The Verge. “You have an outcome that is independent of any one person’s [data] but that is still a good outcome.”
There are some downsides to using differential privacy, though. “By masking outliers, it can sometimes remove relevant or interesting data, especially in varied datasets, like those involving language,” says Erlingsson. “Differential privacy literally means that it’s impossible for the system to learn about anything that happens just once in the dataset, and so you have this tension. Do you have to go get more data of a certain type? How relevant or useful are those unique properties in the dataset?”
But Google hopes that by releasing TensorFlow Privacy more AI developers around the world will start using this technique and these problems can be ameliorated. “There’s work to do to make it easier to figure out this tradeoff,” says Radebaugh.
Ultimately, says Google, it’s better to have more brains involved, and releasing new open-source tools increases the pool of available talent. Plus, being able to add differential privacy to an AI model using just “four or five lines [of code] and some hyper-parameter tuning” is a big step forward in its own right, says Erlingsson. “This is a very different sort of world to what we were in even just a few months ago, so we’re quite proud of that.”