Towards privacy-preserving cybersafety tools by using Federated Learning
Date Issued
August 2021
Author(s)
Advisor
Abstract
Living in the digital era, people can access a huge amount of online
content daily. Online services might bene t humanity by easing many
everyday life tasks. However, people and especially minor users can en-
counter many threats while they are online. Despite various cybersafety
tools and applications, the number of minors experiencing online threats
is not decreasing. This work focuses on cybersafety tools that use Ma-
chine Learning algorithms for automatic detection of inappropriate con-
tent. Such tools require the collection of big of data that are often sen-
sitive. Additionally, keeping these datasets up-to-date and retraining the
models can be challenging. We propose using Federated Learning (FL)
training to overcome these challenges. FL allows training a model on dis-
tributed data without transferring data to a central unit. We provide a
conceptual mapping between the components of a cybersafety framework
architecture and the actors in the FL communication protocol to explain
how FL can be applied in the context of cybersafety tools. We design and
implement a TensorFlow-Federated simulation to explore FL training on
a text classi cation model that detects aggressive text. We experimented
with a centralized dataset of aggressive tweet posts to assess the perfor-
mance of the model trained in the FL approach compared to a model
trained in the centralized approach and explore how the number of clients
participating in FL a ects the model's performance. Additionally, we ex-
perimented with a local model training to assess the client device's CPU
utilization, memory consumption, and execution time. The results show
that the model's performance when trained in FL settings can approach
the model's performance when trained in the traditional approach. Also,
the model's performance improves when more clients participate in the FL
training. Regarding the performance of a client's device, the results show
that the execution time for a local model's training is short and does not
over-consume the device resources. The ndings show that cybersafety
tools are an applicable use case for FL training.
content daily. Online services might bene t humanity by easing many
everyday life tasks. However, people and especially minor users can en-
counter many threats while they are online. Despite various cybersafety
tools and applications, the number of minors experiencing online threats
is not decreasing. This work focuses on cybersafety tools that use Ma-
chine Learning algorithms for automatic detection of inappropriate con-
tent. Such tools require the collection of big of data that are often sen-
sitive. Additionally, keeping these datasets up-to-date and retraining the
models can be challenging. We propose using Federated Learning (FL)
training to overcome these challenges. FL allows training a model on dis-
tributed data without transferring data to a central unit. We provide a
conceptual mapping between the components of a cybersafety framework
architecture and the actors in the FL communication protocol to explain
how FL can be applied in the context of cybersafety tools. We design and
implement a TensorFlow-Federated simulation to explore FL training on
a text classi cation model that detects aggressive text. We experimented
with a centralized dataset of aggressive tweet posts to assess the perfor-
mance of the model trained in the FL approach compared to a model
trained in the centralized approach and explore how the number of clients
participating in FL a ects the model's performance. Additionally, we ex-
perimented with a local model training to assess the client device's CPU
utilization, memory consumption, and execution time. The results show
that the model's performance when trained in FL settings can approach
the model's performance when trained in the traditional approach. Also,
the model's performance improves when more clients participate in the FL
training. Regarding the performance of a client's device, the results show
that the execution time for a local model's training is short and does not
over-consume the device resources. The ndings show that cybersafety
tools are an applicable use case for FL training.
File(s)![Thumbnail Image]()
Name
thesis_Pantelitsa_Leonidou_Towards_Privacy_Preserving_Cybersafety_Tools_By_Using_Federated_Learning_Abstract.pdf
Size
164.4 KB
Format
Adobe PDF
Checksum (MD5)
16e5d426fd7487fb6ca18bc517b01fe8

