Privacy-Preserving Online Content Moderation with Federated Learning
Date Issued
April 30, 2023
DOI
10.1145/3543873.3587366
Abstract
Users are exposed to a large volume of harmful content that appears daily on various social network platforms. One solution to users' protection is developing online moderation tools using Machine Learning (ML) techniques for automatic detection or content filtering. On the other hand, the processing of user data requires compliance with privacy policies. This paper proposes a privacy-preserving Federated Learning (FL) framework for online content moderation that incorporates Central Differential Privacy (CDP). We simulate the FL training of a classifier for detecting tweets with harmful content, and we show that the performance of the FL framework can be close to the centralized approach. Moreover, it has a high performance even if a small number of clients (each with a small number of tweets) are available for the FL training. When reducing the number of clients (from fifty to ten) or the tweets per client (from 1K to 100), the classifier can still achieve AUC. Furthermore, we extend the evaluation to four other Twitter datasets that capture different types of user misbehavior and still obtain a promising performance (61% - 80% AUC).

