AIDE: A federal project for the development of artificial intelligence in Belgium

WP4 – Application


Implementing the federated learning architecture will be deployed on a distributed testbed infrastructure and demonstrated in two application cases. The Place Of Digital (POD) will host the application testbed. The testbed will comprise several distributed servers, enabling us to demonstrate the federated learning technology in a case study. The distributed test bed will allow us to evaluate and test the various components under more realistic conditions. Operating the testbed will involve the following tasks:

  1. Building, maintaining and operating a federated testbed infrastructure.
  2. Designing new development techniques and best practices to take advantage of federated learning.
  3. Developing specific use cases using the project results.

Several case studies will validate the federated learning architecture and its implementation. A steering committee will select these case studies. Based on the needs of the socio-economic fabric, we have already identified two essential case studies described below.

Case studies thus far include (1) predictive maintenance of ball bearings in Industry 4.0 production lines and (2) information sharing in cybersecurity.

The analysis will include a detailed description of the case study, its federated learning dimension, the definition of sharing policies, the services used in federated learning, and the data sources used and produced. These datasets will be created, used for learning purposes, and shared with the community (e.g., https://data.europa.eu/en).

Case 1: The first case study applies federated learning to the predictive maintenance of ball bearings in production machines. Bearings are ubiquitous in machines with moving parts to reduce friction but wear out over time. An earlier project between Flanders Make and IMEC (Smart Maintenance) confirmed that the time to maintenance or replacement could be accurately modelled for these parts using parameters such as vibration, temperature and revolutions per minute. However, the scale of the experiment could have been bigger, partly due to the lack of large-scale testing infrastructure.

CASE 2: The second case study, « federated learning for cybersecurity services », will focus on sharing cyberthreat information using federated learning. According to [JBWS22], « Cyber threat information is any information that can help an organization identify, assess, monitor and respond to cyber threats ». Federated learning aids in tailoring cybersecurity services to NIST functions [BARR18]: identify, protect, detect, respond and recover. In this case study, we will evaluate the benefits of using federated learning for the following cybersecurity services: penetration testing/code repair, malware analysis, threat detection, and intrusion detection. We will adapt selected cybersecurity services for use in a federated learning context. This process will require adjusting the services so that they can be deployed in a federated learning platform and are compatible with orchestrating the different phases of federated learning: service selection, learning, inference and meta-learning.

Selecting the most promising cybersecurity services for this second case study will help determine precisely what information should be shared using federated learning. Federated learning for penetration testing/code repair involves learning sequences of program behaviours, system interactions leading to a vulnerability, and attack trees that combine various components into an attack. This flexible model allows for the adaptation to your specific needs, instilling confidence in its applicability. Sharing policies must ensure that local test models do not reveal any confidential information. Model aggregation involves combining vulnerability and penetration information into attack models. Model updates trigger when models learn new exploits/vulnerabilities or improved attack/penetration models. Input types are data, source code or execution environment. Federated learning for malware analysis involves learning a convolutional neural network trained on image-based malware datasets. Sharing policies verify that local models do not refer to privacy aspects. Local models are aggregated into a global model according to different possible approaches [MDAL18, XKSU92]. The local model will be updated when new malware is detected or if an improved aggregated model trained on a shared dataset becomes available.

Federated threat detection involves learning recurrent neural networks (RNNs) and long-term memory-based models (LSTMs) for anomaly and threat detection. Sharing policies verify that detection agents deployed in multiple networks and clouds comply with local guidelines. Detection models on s, local models are uploaded by local monitoring agents to the backend for aggregation where size-based aggregation can be applied. Local models update with new global models upon detecting new types of anomalies. Federated learning-based intrusion detection involves training an autoencoder with models based on recurrent neural networks (RNNs) for anomaly and threat detection. Sharing policies verify that local models do not reveal private information or make false statements. Local models are updated when new types of intrusion are detected.

Prospective cases. Other case studies will supplement the two case studies above. Each year, we will select a new case study with the help of the steering committee. For the cybersecurity case, each year, a new cybersecurity service will be considered for federated learning based on societal interest and available data. Other prospective cases include sharing medical data to train machine learning algorithms to cure tumours or predict medical treatments (based on a potential collaboration with Sciensano and the results of the European CORDIAL-S project), predictive maintenance of offshore wind farms, smart home activity monitoring and energy optimization, etc.

Under the forthcoming European Digital Identity Regulation, personal digital wallets will contain highly sensitive digital data. The data contained in these wallets could enable innovative services and applications to be developed. As user confidentiality is essential, there is excellent potential for privacy-friendly federated learning solutions. In the same identity management context, SolidLab Flanders will develop a Flemish instantiation of the Solid architecture (https://solidproject.org/). This architecture empowers users by storing their data securely in decentralized data stores called Pods. The federated learning techniques developed in AIDE will support analyzing user data in a privacy-preserving manner.

Potential users will access the application of federated learning technology in the POD. This opportunity will give greater visibility to the benefits of sharing data using federated learning technologies. Federated learning is based on sharing privacy-preserving models instead of data. The benefits, particularly in terms of performance, confidentiality, and privacy compliance, will be made explicit by the application cases, and the demonstrators will have an easy-to-understand experience in the POD. The two application cases will enable potential technology users to transpose the technology’s advantages to other instances where sharing confidential and private data is necessary.

Key Performance Indicators (KPI)LeaderContributorChronology
● an open source implementation of the federated learning architecture applied to application cases.
● a demonstration of each application deployed in the POD and shown to potential users of the technology.
CETICIMEC
UCLOUVAIN
KULEUVEN
● March 2023 : needs analysis of both applications.
● End 2023 : first version of two cases demonstrating federated learning.
 
● End 2025 (if extended) : second version showing the sharing policies and deployment of the initial models learned for inference.
● End 2025 (if extended) : final version of several cases with demonstration of meta-learning and updating of deployed inference models.