Physicochemical-Based Deep Learning for Allergenicity Prediction
Date Issued
August 20, 2025
Author(s)
DOI
10.1007/978-3-032-00652-3_28
Abstract
Predicting protein allergenicity accurately is crucial for food safety and biopharmaceutical development, yet it remains a significant challenge. This paper introduces a novel deep learning framework for allergenicity prediction, employing a one-dimensional convolutional neural network (CNN). Our approach uniquely represents protein sequences using an extensive set of 611 amino acid physicochemical properties, which are then systematically reduced via Principal Component Analysis (PCA) to derive highly informative features. Evaluated on a comprehensive dataset curated from multiple allergen databases, the model utilising the first three principal components (PCA-3) for encoding demonstrates superior performance. It achieved an accuracy of 97.24%, sensitivity of 96.26%, specificity of 97.67%, a Matthews Correlation Coefficient (MCC) of 0.93, and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.97 on the independent test set. These results underscore the power of leveraging PCA-distilled physicochemical features within a CNN architecture for robust and high-accuracy allergenicity prediction, offering a promising advancement in the field.
File(s)![Thumbnail Image]()
Name
978-3-032-00652-3_28.pdf
Size
748.56 KB
Format
Adobe PDF
Checksum (MD5)
43bb8916b6c26c201f9d9ed360638947

