This work package will kick off with an extensive requirements-gathering phase, with a specific focus on the two initial cases (cybersecurity and industrial IoT). For the case of federated learning of cybersecurity services, it will be the team members who will take the lead in identifying requirements for the method and process of federated learning. They will do this by analyzing a wide range of cybersecurity services and identifying those that could benefit most from federated learning. Potential stakeholders in the application of technology, such as the BCC, will assist in prioritizing cybersecurity services.
For the IoT, requirements can be extracted and extended from previous projects on predictive maintenance (e.g., the Smart Maintenance living lab focused on bearings and offshore wind projects) that concentrate on unfederated machine learning methods.
Next, this WP will develop an architecture for federated learning, including APIs and specific interfaces to enable the deployment and monitoring of the various tasks involved in the federated learning process. This work will start from existing solutions in the literature and available libraries.
We are excited to share that our comparison has led to a preliminary selection of viable candidates. These mature federated learning stacks, with their varying complexities, underlying deep learning frameworks (e.g. Tensorflow, PyTorch), support for other deep learning frameworks (e.g. traditional deep learning with sci-kit-learn) and security/privacy primitives offered upfront (e.g. secure aggregation, differential privacy (DP), secure multi-party computation (SMC), homomorphic encryption (HE), …), hold great potential for our operations. The selected candidates are:
- PySyft (https://github.com/OpenMined/PySyft, https://arxiv.org/abs/1811.04017) is based on the PyTorch deep learning framework and supports HE, SMC and DP programs.
- Federated Tensorflow (https://www.tensorflow.org/federated, https://arxiv.org/abs/1902.01046) supports only DP on top of the Tensorflow deep learning stack. Users can deploy it to research prototypes of new algorithms in a simulated setting, but compared with others, its deployment in a truly distributed mode is more complex.
- IBM’s federated learning library (https://github.com/IBM/federated-learning-lib, https://ibmfl.mybluemix.net/) supports DP and SMC. It can also run on traditional ML pipelines (e.g., SVM, decision trees, etc.) and features fairness algorithms to mitigate bias.
- Intel has offered the OpenFL stack (https://github.com/intel/openfl, https://arxiv.org/abs/2105.06413), although the documentation is less clear about the availability of advanced security and privacy modules out of the box.
There are other open-source frameworks for federated learning. Their feature sets and documentation may vary in comprehensiveness, and the developer community behind the framework may not be as large as that of the frameworks above. Still, some may be more flexible for research and validation purposes :
- Fleur : https://flower.dev/, https://github.com/adap/flower, https://arxiv.org/abs/2007.14390
- XayNet : https://www.xaynet.dev/, https://github.com/xaynetwork/xaynet
- PaddleFL : https://github.com/PaddlePaddle/PaddleFL
- FATE : https://github.com/FederatedAI/FATE
Another key objective is to achieve robust and fair (unbiased) federated learning. The fairness properties we consider in this project are model fairness, resource fairness and collaboration fairness. We will meticulously create, adapt or extend state-of-the-art algorithms and systematically analyze and compare different approaches, leaving no stone unturned in our pursuit of the most suitable solutions for the use cases identified in this project. Possible candidate solutions with open-source implementations that can serve as a starting point are as follows:
- Robust and fair FL : https://github.com/XinyiYS/Robust-and-Fair-Federated-Learning
- Fair and coherent FL : https://github.com/cuis15/FCFL, https://arxiv.org/abs/2108.08435
- easyFL – FL with fair average : https://github.com/WwZzz/easyFL
- Equitable allocation of FL : https://github.com/litian96/fair_flearn
The subsequent design of the architecture involves the following specific objectives: (1) Define the federated learning components and extensions needed to enable a federated learning architecture; (2) Define components for the three major stages of federated learning: learning local models, aggregating local models into more complete global models, and deploying global models for inference; (3) Define meta-learning components to continue the learning phase on new operational data, including new data, in parallel with the inference phase, and adapt by deploying updated global models, (4) Define mechanisms to enforce sharing policies that define what information can be shared in local models that respect privacy, ethics and fundamental rights; This includes exploring options for computing on encrypted data such as multi-party computing (MPC) and fully homomorphic encryption (FHE) as well as differential privacy and (5) identifying workflows, component relationships and interactions to enable secure, reliable and scalable deployment of federated learning components.
The architecture considers the « Ethical Guidelines for Trustworthy Artificial Intelligence » published by the EU High-Level Expert Group on AI in April 2019 [EUAI19]. It will ensure that the solutions developed meet the main requirements: « (1) lawful – compliance with all applicable laws and regulations; (2) ethical – respect for ethical principles and values; and (3) robust – both from a technical point of view while taking into account its social environment. »
Key Performance Indicators (KPI) | Leader | Contributor | Chronology |
---|---|---|---|
●Publications on the method: at least 2 publications defining the AIDE methodology, ● Publications on fundamental rights: at least one publication on fundamental rights and one on the right to privacy. ● Publications on adaptation: at least 1 publication and the meta-learning part and adaptation possibilities. | KULEUVEN | UCLOUVAIN CETIC IMEC | ● End 2022 : define needs ● End 2023 : first version of architecture ● End 2024: policy mechanisms for privacy, ethics and fundamental rights ● End 2025 (if extended) : continuous improvement using the Deming wheel. |