Federated learning is a governance problem
RegulAItion’s data science team take a moment to chat through why Governance is key to the application of Federated Learning.
Federated learning is a powerful technological solution for preserving privacy and data integrity while developing AI models. However, this comes at the expense of reduced visibility into the system and data, making it harder to diagnose and fix problems with the AI.
Managing these problems requires not just technological expertise, but a robust governance framework. This is an area that the team at RegulAItion have been applying a lot of thought to...
The data control issue
In normal AI applications, a single party would aggregate data centrally from several sources in order to build a model. In this paradigm, individuals and organisations handing over their data to the AI developer lose control of it; this carries privacy, security and legislative risks. Federated learning is one of several nascent technologies that aims to preserve privacy during the machine learning process.
Federated learning… enables data collaboration
Under federated learning, data remains on the device of its owner. Several data owning parties collaborate to train a shared model by first training a copy of the model on their own data. The AI developer, instead of aggregating data, aggregates these models to combine insights from the data without data ever having to be shared between parties.
At RegulAItion, we deal with problems with a few, known data holders with a significant amount of data. However, federated learning can scale to millions of data holders, as happens with predictive text models on iPhones. Parties might not have the technical capability to develop their models in-house, and parties within a federated learning scheme may be competitors or work on adjacent problems.
Orchestrating network collaboration
Developing a federated learning system has a technological challenge, for sure: you have to set up communication between parties across the world, who may be working on different systems; training must be scheduled between these parties and drop-outs must be handled gracefully, to name but a few elements. All of this requires technical effort to achieve, but these are eminently surmountable problems.
Governance is the key
The challenge to running a successful federated learning scheme in practice, however, is one of process and governance. AI projects never being just about the AI; a trained AI model is deployed in the real world to impact real people using real and often personal data. Equitable use of the model is difficult to get right, and can be disastrous when it goes wrong; there have been several high-profile instances in recent years where predictive models have exhibited racist or sexist behaviour.
What about dataset bias?
These biases are imbued into AI largely from the data on which they’re built. If there is an imbalance in the dataset in certain characteristics, the AI will amplify those differences and find a pattern which either doesn’t really exist or we don’t wish to exhibit in a fair society. In a federated learning scheme you can’t see the data of the other parties, so you cannot tell how biased their dataset is. A rotten dataset spoils the whole model.
Technological solutions might be developed which mitigate part of data bias in federated learning, but it will not solve it completely. Instead, participants need agreements in place which guarantee particular characteristics of each party’s data and mitigate the reputational and financial damage to the other parties if it isn’t followed. The quality of a dataset should be auditable and trackable: if an organisation has found a problem with bias in a dataset, other organisations should be able to search for that evaluation before agreeing to make use of the dataset. These problems, and potential governance solutions, also arise in explainability, model robustness and concept drift in the data.
Auditability of AI decision making
Consider a party which needs to report on a decision made in part by AI. The decision must be linked to a specific datum and model version in order to verify the decision after-the-fact. To better understand why a decision was made, the model version used should also be linked to the data on which it was trained, the organisations which own each bit of data, and the outcomes of model tests.
To make it more complicated, federated learning can be a compounding and dynamic process, with models being retrained at regular intervals on new data, producing an increasing dependency chain of datasets which all impact to some degree the outcome in any particular use of the model. Without a robust governance chain linking data, models, and training actions, good AI reporting is an impossibility.
Ecosystem governance
Governance is also needed during the training process. The parties taking part in a federated learning scheme, while collaborating temporarily, have their own motivations and act independently. A data holding party may decide during training that they no longer wish to take part (for example, due to budgetary constraints), in which case the parties must have a process in place for how to proceed: are the other parties allowed to use the AI model now that a party has withdrawn? Do they continue training without that party or is the entire collaboration void?
What happens if (or when) something goes wrong?
Crucially, what happens when something goes wrong? For example, if the final, trained model doesn’t perform well, are the parties compelled to provide evidence that their data was of sufficient quality? This may be necessary in high-value situations, as a federated learning scheme could be derailed intentionally by a malicious party. And if a party is found to have poor data, how does the scheme proceed? Who has the power to kick a party from the scheme? What is the threshold criteria for this?
Data governance is not just a technical consideration… and some good news
All of these questions and hypotheticals might paint the picture that federated learning is a fraught process, but that is far from the case. Federated learning offers huge value to organisations by allowing them to receive insights from the data and AI and to collaborate to magnify the power of the AI, without having to negotiate dense agreements on data sharing nor having the risk of data leaks. What this does show, however, is that you can’t just consider the technical side of federated learning; you also need governance.
At RegulAItion, we have been thinking at length about these problems and the implications that governance has on scaling federated learning for industrial machine learning use cases. RegulAItion’s AIR platform provides an automated framework to manage collaborative governance with accountability.