How Federated Machine Learning Keeps Data In Your Pocket, Not In the Cloud
Let's illustrate what's happening when searching and autocomplete suggestions are presented. How is the data generated? Why should we pay attention to that?
Join the DZone community and get the full member experience.Join For Free
When we pull our phones out of our pockets, this is often done to search for information. A tap in the Google search box presents us with suggested searches — then ultimately the results of our intended query. In this innocuous interaction, a series of connected events play out, all putting our personal data at risk of exposure. While that might imply some sinister intent, the truth is that Google (and more generally other providers of digital information and services) have created systems to better understand their users. These AI-based systems enable quicker interactions with our devices and convenient user interfaces. But there is a catch. Whether you search, look at videos or play games, these systems collect data that exposes our most intimate and private details. There is a better way.
Let us illustrate this for search and first discuss what is happening when you search for something and are presented with auto-complete suggestions. All the data that is generated by users through forms and browsers is collected and stored centrally. A massive AI modeling system is engaged, indexing and sorting this information from millions of users. From there, the results of this (predictive search) are then pushed back to your personal device. The result of this AI-based inference becomes your possible search queries as you are typing. The key to changing how our data is used is changing the methodology of the process.
However, companies operating personalization services such as search, films, or business news have learned that AI models can exist on the devices themselves rather than on the servers of the operating company, based on a prime model supplied to the devices. This would create our own personal interaction histories on our devices and the AI model would become more personalized based on our behavior. To make it even more productive, we have to consider ways that we can then share this knowledge with other devices in the pockets of other users without revealing the personal data that resides within each individual AI model.
That is the problem that Federated Machine Learning seeks to solve. How can we still receive the full benefits of predictive AI modeling without sharing personal data among different users and exposing it to operating companies?
Federated Machine Learning Seeks To Redefine Control of Protecting Personal Information
In order to preserve personal, private data but still be able to access it to improve AI modeling, we have to create personalized models on user devices and transform them — into encrypted number vectors — so that these encrypted vectors are combined with similar vectors from other users. Within this function, there would be a protocol that eliminates the encryption in the computed results once the data has been combined. At that point, we have an aggregated AI model, unaware of the personal points of data that identify the data other than what it is — aggregated data stripped of any personal connection but used to serve a greater analytical purpose.
This aggregated model, when properly implemented, will no longer allow any inference of personal data of any of the participants from any of their devices. Once that global AI model is computed, it can then be pushed back to users' personal devices where it may be further personalized before use. When users further interact with their devices (news feed, video platform, etc.,) that global model can train over interaction. This roundabout of model training with local data, a combination of locally trained AI models, and communication of combined, global AI models to all devices can be completed any number of times until the quality of the AI model has sufficiently improved.
What Federated Machine Learning is trying to do is to make people realize that it can intelligently solve use cases for their own needs without the need to share personal data, be it a voice assistant or otherwise; and it builds a framework that software developers and AI engineers can use to create such solutions. With this framework, raw data never leaves the devices of users, only encrypted AI models reach a central server for consolidation. This central server then combines these encryptions without knowing the individual AI models but where the decryption of that combination is a global AI model containing all the insights of the local models. This enables users to retain privacy while still allowing the AI systems to learn and improve the predictive power of the AI.
The techniques used by Federated Machine Learning keep your data in your pocket and not on a server while still having access to predictive modeling. But they also lend themselves heavily to data analytics. If you wanted to compute fairly sensitive metrics from users, maybe the median daily heart rate, the average number of steps walked in a day, or their maximal length of a menstrual cycle in a year, then these processes allow for the collection of that data while still retaining the privacy wall (an important distinction to consider under EU privacy laws but also when reflecting that some of this information is rather intimate). On a basic level, data analysis could encompass how many times users clicked on a specific page or how many minutes a day users were engaged with a certain app.
For example, consider Clara Federated Learning from NVIDA. Clara FL is used by participating hospitals to label their patient data. Using pre-trained models and transfer learning techniques, the AI assists hospitals in labeling, reducing study time of complex studies from hours to minutes. The desired endpoint of this is in order to solve complex medical problems and complete research, it is beneficial to combine data. This is generally not allowed based on privacy laws. Federated Machine Learning would allow hospitals to perform analytical functions, such as finding the median heart rate of female patients with a certain condition, without revealing anything about the heart rates of any of the other patients.
Federated Machine Learning creates this methodology, called Federated Analytics, of bringing together all this information and then aggregating it into actionable statistics knowing that the analytics don't care about the individual, but rather the patterns in the user population. With Federated Machine Learning, this technique, when applied to data queries, allows you to produce the results of these analytics without having to transfer the raw user data to a central site, maintaining the privacy (and avoiding any IP issues) that this would generally entail.
The goal here is to create a system in which users retain full ownership of their personal data from creation to analytics and everything in between. A system that allows for users and operating companies to have access to the same data analytics and data aggregation that creates the convenience of predictive modeling for their maps apps, voice assistants, and so forth while never compromising their privacy is the one being built by Federated Machine Learning. This creates more security in those everyday activities that draw users to their apps. It creates a scenario in which big tech companies are no longer able to claim ownership over your personal data because there is, in fact, a better way.
Opinions expressed by DZone contributors are their own.