From prediction to design: Revealing the mechanisms of umami peptides using interpretable deep learning, quantum chemical simulations, and module substitution
TL;DR Summary
This study uses interpretable deep learning and module substitution to efficiently screen and design umami peptides, achieving 0.94 accuracy. It identifies various umami peptides, explores module substitution mechanisms, and highlights essential amino acids for taste enhancement.
Abstract
This study screened and designed umami peptides using deep learning model and module substitution strategies. The predictive model, which integrates pre-training, enhanced feature, and contrastive learning module, achieved an accuracy of 0.94, outperforming other models by 2–9 %. Umami peptides were identified through virtual hydrolysis, model predictions, and sensory evaluation. Peptides EN, ETR, GK4, RK5, ER6, EF7, IL8, VR9, DL10, and PK14 demonstrated umami taste and exhibited umami-enhancing effects with MSG. Module substitution strategy, where highly contributive module from umami peptides replace corresponding module in bitter peptides, facilitates peptide design and modification. The mechanism underlying module substitution and taste presentation were elucidated via molecular docking and active site analysis, revealing that substituted peptides form more hydrogen bonds and hydrophobic interactions with T1R1/T1R3. Amino acids D, E, Q, K, and R were critical for umami taste. This study provides an efficient tool for rapid umami peptide screening and expands the repository.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
From prediction to design: Revealing the mechanisms of umami peptides using interpretable deep learning, quantum chemical simulations, and module substitution
1.2. Authors
Lijun Su, Zhenren Ma, Huizhuo Ji, Jianlei Kong, Wenjing Yan, Qingchuan Zhang, Jian Li, Min Zuo. The affiliations indicate research is conducted at the School of Food and Health, Beijing Technology and Business University, and the School of Information, Beijing Wuzi University, suggesting a multidisciplinary approach combining food science and information technology.
1.3. Journal/Conference
The paper is published in Food Chemistry. This is a highly reputable journal in the field of food science, known for publishing high-impact research on the chemical and biochemical aspects of food. Its influence is significant for studies related to food quality, safety, and functional ingredients. The specific reference in the acknowledgements points to https://doi.org/10.1016/j.foodchem.2025.144301, indicating it is either published or accepted for publication in 2025.
1.4. Publication Year
2025 (as indicated by the DOI in the supplementary data section: ).
1.5. Abstract
This study focused on screening and designing umami peptides using a novel deep learning model and module substitution strategies. The predictive model, which integrates pre-training, enhanced feature, and contrastive learning modules, achieved an accuracy of 0.94, outperforming existing models by 2–9%. Through virtual hydrolysis, model predictions, and sensory evaluation, several umami peptides (EN, ETR, GK4, RK5, ER6, EF7, IL8, VR9, DL10, and PK14) were identified that exhibited umami taste and enhanced umami effects synergistically with monosodium glutamate (MSG). A module substitution strategy, replacing high-contributory bitter peptide modules with high-contributory umami peptide modules, facilitated peptide design. The underlying mechanisms of module substitution and taste presentation were elucidated using molecular docking and active site analysis, revealing that substituted peptides form more hydrogen bonds and hydrophobic interactions with the T1R1/T1R3 taste receptor. Amino acids D, E, Q, K, and R were identified as critical for umami taste. This research provides an efficient tool for rapid umami peptide screening and expands the repository of known umami peptides.
1.6. Original Source Link
/files/papers/69135ac4430ad52d5a9ef421/paper.pdf Publication Status: The presence of a DOI () suggests that the paper has been accepted for publication and will appear in Food Chemistry in 2025. The provided link is an internal file path, likely a pre-publication version.
2. Executive Summary
2.1. Background & Motivation
The core problem this paper aims to solve is the inefficient and resource-intensive traditional methods for screening and designing umami peptides. Umami peptides are a type of flavor enhancer that can reduce the need for traditional sodium-rich enhancers like MSG (monosodium glutamate), aligning with growing consumer demand for healthy, natural, and nutritious food. Traditional screening involves complex multi-step chromatographic separation, chemical synthesis, and sensory evaluation, which are time-consuming and costly, severely limiting high-throughput screening and industrial application.
This problem is important because umami peptides offer a healthier alternative for flavor enhancement, potentially mitigating risks associated with high sodium intake (e.g., hypertension) and stimulating appetite. The existing methods pose a significant bottleneck for their widespread adoption and further research.
The paper's entry point is the development of an accurate and efficient in silico (computational) method to rapidly screen and design umami peptides. It leverages interpretable deep learning and a module substitution strategy to overcome the limitations of previous machine learning models, which often struggled with effective peptide feature representation and reliance on manually extracted features. The innovative idea is to combine advanced deep learning architectures with a biologically informed module substitution strategy to not only predict but also intelligently design peptides, while simultaneously elucidating the underlying molecular mechanisms.
2.2. Main Contributions / Findings
The paper makes several primary contributions:
-
Development of an Advanced Deep Learning Model: It proposes a novel predictive model for umami peptides that integrates a
pre-training module(usingBERTfor feature encoding), anenhanced feature module(incorporating various physicochemical and structural properties), and acontrastive learning module. This model achieved state-of-the-art performance with an accuracy of 0.94, outperforming other models by 2–9 percentage points. -
Identification of Novel Umami Peptides: Using the developed model, virtual hydrolysis of Tenebrio molitor protein, and subsequent sensory evaluation, ten previously unreported umami peptides (EN, ETR, GK4, RK5, ER6, EF7, IL8, VR9, DL10, and PK14) were identified. These peptides exhibited strong umami taste and synergistic umami-enhancing effects with MSG, with detection thresholds lower than MSG.
-
Introduction of a Module Substitution Strategy: The study proposed and demonstrated a novel
module substitutionstrategy, where highly contributive dipeptide fragments from umami peptides replaced corresponding highly contributive fragments in bitter peptides. This strategy was shown to successfully convert non-umami peptides into umami peptides, facilitating precise peptide design and modification. -
Elucidation of Umami Taste Mechanisms: Through
interpretable deep learning(attention value analysis),quantum chemical simulations(HOMO/LUMO analysis), andmolecular dockingexperiments, the study identified critical amino acid residues (D, E, Q, K, and R) for umami taste and elucidated the molecular mechanisms of taste presentation and howmodule substitutionalters taste characteristics (e.g., increased hydrogen bonds and hydrophobic interactions with the T1R1/T1R3 receptor).The key conclusions and findings are:
-
The integrated deep learning model is a highly accurate and robust tool for predicting umami peptides and their thresholds.
-
Tenebrio molitor protein is a promising source for novel umami peptides.
-
Specific amino acids (D, E, Q, K, R) and dipeptide fragments (EE, DE, EK, EL, EA) play crucial roles in umami taste.
-
Module substitution is a viable and effective strategy for rational peptide design to engineer taste properties.
-
The mechanism of umami taste involves specific interactions (hydrogen bonds, hydrophobic interactions) between peptides and the T1R1/T1R3 receptor, which can be modulated by peptide sequence changes.
These findings address the challenge of rapid and precise umami peptide discovery and design, offering a powerful computational framework that reduces reliance on laborious experimental methods and provides mechanistic insights.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully grasp this paper, a novice reader should understand the following fundamental concepts:
- Peptides:
Peptidesare short chains of amino acids linked bypeptide bonds. They are smaller than proteins and play various biological roles, including acting as hormones, enzymes, or, in this context, flavor compounds. Their sequence of amino acids dictates their structure and function. - Umami Taste:
Umami(often translated as "savory") is one of the five basic tastes, alongside sweet, sour, bitter, and salty. It's often described as a pleasant, savory, or meaty taste, typically associated with glutamate, aspartate, and certain nucleotides. - Umami Peptides: These are specific peptides that elicit or enhance the
umamitaste. They are a focus of research for their potential as natural flavor enhancers and sodium reduction agents. - Deep Learning (DL): A subfield of
machine learningthat uses artificialneural networkswith multiple layers (hence "deep") to learn complex patterns from data. Unlike traditional machine learning, deep learning models can automatically extract features from raw data, reducing the need for manual feature engineering. - Neural Networks: Computational models inspired by the structure and function of the human brain. They consist of interconnected
nodes(neurons) organized in layers. Each connection has aweight, which is adjusted duringtrainingto minimize the difference between predicted and actual outputs. - BERT (Bidirectional Encoder Representations from Transformers): A powerful
pre-traineddeep learning model fornatural language processing (NLP). BERT is designed to understand the context of words in a sentence by looking at words that come before and after it simultaneously. In this paper, peptide sequences are treated like "sentences" of amino acids, and BERT is used to learn rich, contextual feature representations of these sequences. - Pre-training: The process of training a large model (like BERT) on a massive, general dataset (e.g., text from the internet, or diverse peptide sequences in this case) to learn general representations or knowledge. This pre-trained model can then be
fine-tunedon smaller, specific datasets for downstream tasks (e.g., predicting umami peptides), which is a form oftransfer learning. - Contrastive Learning: A
self-supervised learningtechnique where a model learns representations by comparing samples. The goal is to make similar samples (positive pairs) have similar representations in a learnedlatent spacewhile pushing dissimilar samples (negative pairs) apart. This helps the model learn to distinguish subtle differences and similarities in data. TheInfoNCE loss functionis a common objective function used in contrastive learning. - Molecular Docking: A computational simulation technique that predicts the preferred orientation of one molecule (e.g., a peptide) to another (e.g., a taste receptor) when bound together to form a stable complex. It helps understand
ligand-receptor interactionsat an atomic level, includinghydrogen bondsandhydrophobic interactions. - Quantum Chemical Simulations: Computational methods rooted in
quantum mechanicsto study the electronic structure and properties of molecules. In this paper,HOMO(Highest Occupied Molecular Orbital) andLUMO(Lowest Unoccupied Molecular Orbital) analyses are performed. These orbitals describe where electrons are most likely to be found and where they are likely to accept electrons, respectively, providing insights into a molecule's reactivity and active sites. - T1R1/T1R3 Receptor: This is a specific
G protein-coupled receptor (GPCR)complex found on taste bud cells, primarily responsible for detectingumamitaste. Peptides (or MSG) bind to this receptor to trigger the sensation of umami. - Module Substitution: A strategy for designing or modifying peptides by replacing specific
fragments(modules) of their amino acid sequence with other fragments, aiming to alter or enhance a desired property (e.g., taste, bioactivity).
3.2. Previous Works
The paper builds upon a foundation of previous research in umami peptide prediction and design:
-
Traditional Machine Learning (ML) Models:
iUmami-SCM(Charoenkwan et al., 2020): This model usedmachine learning algorithmscombined with ascoring card method (SCM)based ondipeptide propensity scoresto predict umami peptides, achieving an accuracy of 0.824. This highlights early attempts to use computational methods but also the limitations in accuracy.Gradient boosting decision tree(Cui et al., 2023): Another ML model that predicted umami peptides based onmolecular descriptors. While performing well, ML models often rely on manually extracted features, which can introduce noise and redundancy.
-
Early Deep Learning (DL) Models:
Umami-MRNN(Qi et al., 2023): Combinedmulti-layer perceptron (MLP)andrecurrent neural network (RNN)using six feature vectors, achieving 90.5% accuracy on the UMP499 dataset. This showed the potential of DL.Two-stage training strategy(Zhang et al., 2023): Utilizedbi-directional encoder representations from transformers (BERT)with aninception networkfor umami peptide prediction, achieving 93.23% accuracy on a balanced dataset. This paper was a direct predecessor in applying BERT.UMPred-FRL(Charoenkwan et al., 2021),Umami-YYDS(Cui et al., 2023),Jiang's method(Jiang et al., 2023), andIUP-BERT(Jiang et al., 2022) are other state-of-the-art models mentioned for comparison, indicating a continuous effort in the field to improve prediction accuracy using various ML and DL techniques.
-
Peptide Feature Representation: Previous research often relied on
limited sets of feature descriptors, sometimes neglecting crucialphysicochemical propertiesandstructural informationof peptides, which can restrict model accuracy and generalization. The use ofpre-traininganddiverse learning strategiesto extract significant features (Lv et al., 2021) was identified as an effective approach. -
Peptide Modification Strategies:
Fragment substitution(Meng et al., 2024): Used to enhance the activity ofangiotensin-converting enzyme inhibitory peptides.Single amino acid substitutions(Jia et al., 2024): Studies showed that removing bitter amino acids from umami peptides could increase umami intensity. However, these were limited to single amino acids, which are rarely considered functionalmodules.
3.3. Technological Evolution
The field has evolved from laborious wet-lab experimental screening methods to in silico approaches. Initially, traditional machine learning models were employed, but they faced challenges in effectively representing peptide sequence features and often relied on manually extracted features prone to noise. The advent of deep learning brought capabilities like automatic data processing and hidden feature discovery, significantly accelerating screening. More recently, pre-trained models like BERT, inspired by natural language processing, have shown promise due to the analogy between peptide sequences and text sequences. This paper represents a further evolution by integrating pre-training, enhanced feature modules, and contrastive learning to not only improve prediction but also incorporate interpretability and a novel module substitution strategy for rational design, moving beyond mere prediction to active design. Quantum chemical simulations and molecular docking represent the integration of computational chemistry to understand the underlying molecular mechanisms, further advancing the design aspect.
3.4. Differentiation Analysis
Compared to the main methods in related work, this paper's core innovations are:
-
Integrated Model Architecture: While previous works used BERT (e.g., Zhang et al., IUP-BERT) or combined different neural networks (Umami-MRNN), this paper combines a BERT-based
pre-training module, anenhanced feature module(integrating four different sequence representation methods:DistancePair,CKSAAGP,QSOrder,DDE), and acontrastive learning module. This multi-faceted feature extraction and learning strategy leads to significantly improved predictive performance (outperforming others by 2-9%). -
Contrastive Learning for Feature Enhancement: The explicit incorporation of
contrastive learningis a key differentiator. By actively minimizing distances between positive pairs and maximizing between negative pairs, the model learns more robust and discriminativelatent spacerepresentations, which is crucial for distinguishing betweenumami,bitter, andnon-tastepeptides. -
Beyond Prediction to Design (Module Substitution): Unlike most previous models that primarily focus on
prediction, this study introduces amodule substitution strategy. This strategy enables precisedesignandmodificationof peptides by replacing high-contributorybitter moduleswith high-contributoryumami modulesidentified through modelinterpretability. This is a significant step towardsrational peptide engineering. -
Mechanistic Elucidation through Interpretability: The paper uses
attention value analysisfromBERTto understand whichamino acidsanddipeptidesare most important for umami taste, and combines this withquantum chemical simulations(HOMO/LUMO) andmolecular dockingto explain the molecular basis of umami perception and the effect ofmodule substitutiononreceptor binding. This interpretability and mechanistic understanding is often lacking in black-box deep learning models. -
Focus on Tenebrio molitor: The application of the model to
virtually hydrolyzeTenebrio molitor protein and validate novel umami peptides from a specific, protein-rich source adds practical value and expands theumami peptide repository.In essence, this paper moves beyond simply predicting umami peptides by offering a comprehensive framework that includes enhanced prediction, interpretable insights into taste mechanisms, and a practical strategy for rational peptide design.
4. Methodology
4.1. Principles
The core idea of this method is to leverage the power of advanced deep learning to accurately predict umami peptides and their thresholds, and then to use the insights gained from the interpretable nature of the deep learning model to inform a module substitution strategy for rational peptide design. This approach aims to accelerate the discovery and optimization of umami peptides while also elucidating the molecular mechanisms of umami taste perception. The theoretical basis lies in treating peptide sequences as analogous to natural language, allowing Transformer-based models (like BERT) to learn contextual features, augmenting these with established physicochemical properties, and refining representations using contrastive learning. The design principle relies on identifying highly contributive peptide modules for umami and bitter tastes and rationally swapping them to alter taste profiles. Finally, quantum chemical simulations and molecular docking provide a theoretical foundation for understanding molecular interactions with taste receptors.
4.2. Core Methodology In-depth (Layer by Layer)
The overall framework of the umami peptide prediction model consists of four main modules: (i) a BERT-based pre-training module; (ii) a features-enhanced module; (iii) a contrastive learning module; and (iv) a prediction module.
The following figure (Figure 2 from the original paper) shows the framework of the umami peptide prediction model:
该图像是umami肽预测模型的框架示意图。图中展示了预训练、特征增强模块及对比学习模块的结构,利用BERT进行特征编码,通过特征融合生成最终的预测结果,目标为识别umami、苦味和其他味道的肽。
4.2.1. Benchmark Dataset
High-quality datasets are crucial for building robust predictive models. The study uses a multi-stage data collection and preparation process.
-
Pre-training Stage: A large collection of
biopeptide sequenceswas gathered from various public datasets to enable the BERT model to learn general sequence characteristics. These datasets included:- 1850
anticancer peptidesfrom the UCI Machine Learning Repository. - 847
neuropeptidesfrom NeuroPedia. - 1010
antituberculosis peptidesfrom AntiTbPdb. - 2325
fermentation-derived peptidesfrom FermFooDb. - 6289
food-derived bioactive peptidesfrom DFBP. - 20,027
naturally occurring signal peptidesfrom PeptideDB. Sequences were filtered to lengths between 2 and 50 amino acids, and duplicates were removed.
- 1850
-
Retraining Stage (UMP1080 Dataset): For
umami peptideprediction, a specific dataset,UMP1080, was constructed:- Umami Peptides: 360 experimentally verified
umami peptidescollected from Web of Science (up to May 2024), TastepeptidesDB, and BIOPEP-UWM database. - Bitter Peptides: 360
bitter peptidescollected from TastepeptidesDB, BIOPEP-UWM, and other research studies. - Neither Umami nor Bitter Peptides: 360 peptides randomly selected from the
pre-trained datasetthat did not exhibit either umami or bitter taste. Thisbalanced UMP1080 datasettotals 1080 peptides (360 for each class).
- Umami Peptides: 360 experimentally verified
-
Dataset Split: The
UMP1080dataset was randomly divided into:- Training Set: 300 umami, 300 bitter, 300 neither umami nor bitter peptides (total 900).
- Test Set: 60 umami, 60 bitter, 60 neither umami nor bitter peptides (total 180).
4.2.2. Pre-trained BERT Model for Feature Encoding
BERT (Bidirectional Encoder Representations from Transformers) is employed to convert peptide sequences into high-dimensional feature sets.
- Mechanism: BERT uses a
Transformerarchitecture withself-attentionmechanisms to process input sequences. It considers all surrounding "words" (amino acids in this context) to generate a contextual representation for each amino acid. This helps capture complex dependencies and overcome the redundancy ofmanually crafted features. - Model Architecture: A BERT model with 12
Transformer encoderswas constructed, each containing 12multi-head attention mechanisms. - Input/Output: Peptide sequences are directly input. The model accepts sequences with a maximum length of 512 amino acids and outputs
feature vectorswith a dimension of 768. These vectors serve as input fordownstream tasks.
4.2.3. Enhanced Feature Construction
To enrich the feature representation beyond what BERT alone provides, four additional sequence representation methods are utilized, focusing on physicochemical properties and structural information. These are referred to as the enhanced feature module.
-
DistancePair: Integrates
Pseudo Amino Acid Composition (PseAAC)withdistance-pair information.PseAACextends traditionalamino acid compositionby addingposition-specificandsequence-specific information.DistancePaircalculates distances between amino acid pairs and uses areduced alphabetto capture structural and functional characteristics. -
CKSAAGP (Composition of k-spaced amino acid group pairs): Considers the composition of
amino acid pairsseparated by positions in the sequence. It calculates the frequency of allk-spaced amino acid pairs, capturinglong-distance interaction information. -
QSOrder (Quasi-Sequence Order): Characterizes sequences by analyzing
order relationshipsbetween amino acids, incorporatingcompositionandorder informationto reflectstructural characteristics. It captures bothlocalandglobal structural information. -
DDE (Dipeptide Deviation from the Expected Mean): Represents sequences by calculating the
frequencyof alldipeptide combinationsand theirdeviationfrom theexpected frequency. -
Output: Integrating these methods, each sequence is transformed into a
562-dimensional feature vector. Thisenhanced featureset is then combined with the BERT-generated features.
4.2.4. Contrastive Learning Strategy
Contrastive learning is applied to refine the feature representations by focusing on similarities and dissimilarities between samples. This is part of the contrastive learning module.
-
Purpose: To learn robust
feature representationswherepositive sample pairsare brought closer together in thelatent space, andnegative sample pairsare pushed further apart. -
Data Augmentation:
Noiseis added to enhance theuniformityof the combined features (from BERT and the enhanced feature module). -
Loss Function: The
InfoNCE(Info Noise-Contrastive Estimation)contrastive loss functionis used during training.The
InfoNCEloss is defined as: Where: -
: The total number of samples in the batch.
-
: The contrastive loss for the batch.
-
: The
feature representation(embedding) of the -th sample. -
: The
feature representationof apositive samplecorresponding to (e.g., an augmented version of or another sample from the same class). -
: The
feature representationof the -thnegative samplecorresponding to (e.g., samples from different classes). -
: The number of
negative samples. -
: A
similarity function(e.g.,cosine similarity) that measures the similarity between twofeature vectorsand . -
: A
temperature parameterthat controls thesharpnessof the probability distribution and theshapeof the loss function. A smaller makes the model more sensitive to small differences in similarity.
4.2.5. Prediction Module
This module takes the refined features from contrastive learning and performs either classification or regression.
4.2.5.1. Classifier of umami peptides
-
Input: Noise-free features extracted by the
contrastive learning module. -
Architecture: A
two-layer fully connected neural network. -
Task:
Three-class classificationto identifyumami,bitter, orneithertaste attributes. -
Loss Function: The
Softmax cross-entropy lossis used forconvergence training.The
Softmax cross-entropy lossis formulated as: Where: -
: The
cross-entropy loss. -
: The number of classes (in this case, 3: umami, bitter, neither).
-
y _ { c }: Thetrue labelfor class , which is 1 if the sample belongs to class and 0 otherwise. -
p _ { c }: Thepredicted probabilitythat the sample belongs to class , output by theSoftmaxfunction.
4.2.5.2. Regressor of umami peptides
This part predicts the umami threshold values.
-
Outlier Handling:
Outliersin the data are identified using theinterquartile range (IQR)method. Values outside the range of and are replaced with the mean of the data to mitigate their impact. -
Clustering: The
K-nearest neighbors (KNN)algorithm is applied to cluster the data into three categories based on a defined threshold. -
Regression Model: The
AdaBoost regressoris employed for predictions.-
A
classification model(based on afully connected neural network) is first trained using labels derived from theclustering results. -
For each class identified by the classifier, a separate
AdaBoost regressoris trained. -
During prediction, the data is first classified, and then the corresponding
AdaBoost regressoris selected forregression analysis.The formula for
AdaBoost regressionis: Thesquared lossforAdaBoostis defined as: Where:
-
-
: The
final regression predictionfor input . -
: The
number of weak regressors(individual models) in theensemble. -
: The
weightof the -thweak regressor, representing itsimportancein the ensemble. -
: The
predictionof the -thweak regressorfor the input . -
: The
true value(actual umami threshold). -
: The
loss function, specificallysquared loss, used to evaluate the error between the true and predicted values. -
Training Parameters:
- Epochs: 100.
- Initial Learning Rate: 0.0001.
- Learning Rate Scheduler: Reduces learning rate by 0.8 times if accuracy does not improve for 5 consecutive epochs.
- Cross-validation: 5-fold cross-validation for
parameter optimization. - Optimizer:
Adam optimizer. - Regularization:
Early stopping strategyto reduceoverfittinganddropout mechanismto enhancegeneralization ability. - Software: Experiments conducted using
PyTorchandCUDAframework withPython.iFeatureOmegapackage used to obtain sequence representations (DistancePair, CKSAAGP, QSOrder).
4.2.6. Attention Value Analysis
The self-attention mechanism within BERT is leveraged to understand the importance of individual amino acids and dipeptides.
4.2.6.1. Amino acid importance analysis
-
Mechanism: BERT's
self-attentionallows it to capture complex dependencies among input features by calculatingattention scoresbetween different parts of the input. -
Attention Formula (Scaled Dot-Product Attention):
Where:
-
: The
Query matrix, derived from the input sequence. -
: The
Key matrix, derived from the input sequence. -
: The
Value matrix, derived from the input sequence. -
: The
dot productof the Query and Key matrices, indicating thesimilaritybetween each query and key. -
: The
dimensionalityof thekey vectors, used to scale the dot product to prevent large values from pushing thesoftmaxfunction into regions with tiny gradients. -
: A function that normalizes the scores to create a probability distribution, indicating how much attention each
valueshould get. -
: The
Value matrix, weighted by thesoftmax scoresto produce the final output. -
Amino Acid Attention Value Calculation: The
attention valuesfor individual amino acids withinumami peptidesare extracted. This is done by consideringattention scoresin various configurations:[CLS]amino acid, amino acid[CLS],[SEP]amino acid, and amino acid[SEP].[CLS](classifier token) and[SEP](separator token) are special tokens used in BERT to mark the beginning of a sequence and the separation of segments, respectively. Theattention valueof an amino acid (AttentionAA) is defined as: Where: -
: The
layer numberof the Transformer (e.g., 12 in the BERT model). -
: The
head numberwithin each Transformer layer (e.g., 12 in the multi-head attention). -
: The
attention scorebetween token and token , derived from theAttention (Q, K, V)calculation. -
: Attention from the
[CLS]token to anamino acid (AA). -
: Attention from an
amino acid (AA)to the[CLS]token. -
: Attention from the
[SEP]token to anamino acid (AA). -
: Attention from an
amino acid (AA)to the[SEP]token. This sum and averaging across layers and heads provides a comprehensive measure of how much "attention" the model pays to a specific amino acid in relation to these special tokens, indicating its importance for classification.
4.2.6.2. Score of amino acid pair analysis
- Calculation: The
pairwise amino acid scoreis defined as the product of theaverage attention valueof eachdipeptide(two-amino acid fragment) and itsfrequency of occurrencein the dataset. - Data Source: Average attention values for dipeptides are obtained from the
BERT pre-trained model. Frequencies are calculated by counting occurrences inumamiandbitter peptidedatasets. - Purpose: To identify
dipeptide segmentswithhigh contributionorsignificant characteristicsforumamiandbittertastes.
4.2.7. Preparation and Taste Characteristics Prediction of Peptides
This section describes the process for obtaining potential umami peptides from a biological source.
- Protein Source: Tenebrio molitor (yellow mealworm) protein (GenBank accession number: KAH0814361.1) was chosen due to its high protein content and richness in umami amino acids.
- Virtual Hydrolysis:
In silico(computational) hydrolysis of the Tenebrio molitor protein sequence was performed using thePeptideCutteronline program (http://web.expasy.org/peptide_cutter).- Enzymes: Two specific enzymes,
pepsin(pH 2) (EC: 3.4.23.1) andtrypsin(EC: 3.4.21.4), were selected to simulate enzymatic digestion.
- Enzymes: Two specific enzymes,
- Preliminary Screening: The resulting peptides were screened for
water solubilityandtoxicity.Water Solubility: Predicted using thePeptide Property Calculator(https://www.innovagen.com/proteomics-tools). Results are "Good water solubility" or "Poor water solubility."Toxicity: Evaluated using theToxinPredprogram (http://www.imtech.res.in/raghava/toxinpred/).
- Model Prediction: Peptides with good solubility and non-toxicity were then subjected to prediction by the developed
deep learning modelto determine theirumami characteristicsandthresholds.
4.2.8. Sensory Evaluation
To validate the model's predictions, sensory evaluation was conducted.
- Ethics Approval: Approved by the Ethics Committee of Beijing Technology and Business University (No.2023050).
- Assessors: 15 trained assessors (7 male, 8 female, aged 23-31) from Beijing Technology and Business University, healthy, non-smokers, no taste/olfactory disorders, with prior sensory assessment training.
- Method: Standardized
sip-and-spit method. - Taste Characteristics: Descriptive analysis of peptides at concentration, using criteria from Song et al. (2023) and Gu et al. (2024).
- Detection Threshold: Determined using the
three-alternative forced-choice (3-AFC) method.- Test samples () serially diluted at 1:1 (V/V).
Sigma curve analysisused for probability detection results.- Threshold defined as the concentration corresponding to a
50% detection probability(according to ASTM E1432 and Tempere, 2011).
- Interaction with MSG:
Sigmoid curve analysisused to elucidate synergistic, additive, or masking effects withmonosodium glutamate (MSG)solutions. TheR-valueis used, defined as the ratio of experimentally determined threshold to theoretically predicted threshold.- : Synergistic effect.
- : Additive effect.
- : No interaction.
- : Masking effect.
4.2.9. Active Site Analysis of Peptides Based on Quantum Chemical Computing
To understand the molecular-level mechanisms of taste.
- Software:
Gaussian 16andGaussView 6.0programs. - 3D Structure Construction:
GaussView 6.0for buildingumami peptidestructures. - Geometric Optimization:
Density functional theory (DFT)withB3LYP/6-311G(d,p) basis setinGaussian 16. This method calculates the minimum energy structure. - Vibrational Frequency Calculations: Performed to confirm the
minimum energy structure. - Frontier Molecular Orbitals (FMOs):
HOMO(Highest Occupied Molecular Orbital) andLUMO(Lowest Unoccupied Molecular Orbital) are calculated using theMolekel program.HOMO: Indicates regions most likely to donate electrons.LUMO: Indicates regions most likely to accept electrons.HOMO-LUMO energy gap: Reflects molecular reactivity; a smaller gap suggests higher reactivity and a greater propensity to interact with other molecules (e.g., taste receptors).
- Purpose: Identifying
active sitesinumami peptide moleculesto elucidate theirtaste-presenting mechanisms.
4.2.10. Precise Design and Modification of Peptides
This section outlines the module substitution strategy for rational peptide design.
- Strategy: Replacing
highly contributive dipeptide fragmentsfrombitter peptideswithhighly contributive fragmentsfromumami peptides. - Module Selection:
EE(glutamic acid-glutamic acid), identified as a highumami activitymodule throughinterpretability analysisof the deep learning model, was used to replace high-contributionbitterness modulessuch asPF,FP,GP,PP, andPG. - Prediction: The
taste characteristicsandthresholdsof thesubstituted peptideswere predicted using the developeddeep learning model. - Purpose: To enhance the
functional propertiesandtaste characteristicsof peptides, moving from prediction to activepeptide design and modification.
4.2.11. Molecular Docking of Peptides and Taste Receptor T1R1/T1R3
To understand how module substitution affects taste characteristics at a molecular level.
- Method:
Semiflexible molecular docking. - Target Receptor: (the
umami taste receptor). - Receptor Structure: 3D crystal structure of was constructed using
homology modeling, with the fish taste receptorT1R2a-T1R3(PDB ID: 5X2M) as the template. - Docking Software:
Autodock Vina. - Docking Box Parameters:
- Center coordinates: , , .
- Box dimensions: , , and .
- Analysis: software used to analyze optimal
docking poses, identifyingkey amino acid residuesandinteraction forces(e.g.,hydrogen bonds,hydrophobic interactions) involved in peptide binding with T1R1/T1R3. - Purpose: To elucidate the
molecular mechanismby whichmodule substitutionalters thetaste characteristicsof peptides.
4.2.12. Statistical Analysis
- Software:
Microsoft Office Excel 2019andOrigin 2024. - Significance Testing:
Independent samples t-test. - Significance Level: was considered statistically significant.
5. Experimental Setup
5.1. Datasets
The study utilized a comprehensive set of datasets for pre-training and fine-tuning the deep learning model.
-
Pre-training Dataset:
- Purpose: To enable the
BERT modelto learn general contextual features from diversebiopeptide sequences. - Sources:
- 1850
anticancer peptides(UCI Machine Learning Repository). - 847
neuropeptides(NeuroPedia). - 1010
antituberculosis peptides(AntiTbPdb). - 2325
fermentation-derived peptides(FermFooDb). - 6289
food-derived bioactive peptides(DFBP). - 20,027
naturally occurring signal peptides(PeptideDB).
- 1850
- Characteristics: Included sequences with lengths between 2 and 50 amino acids, with duplicate sequences removed. This large, diverse dataset allows the BERT model to develop a robust understanding of peptide sequence patterns before being applied to the specific task of umami peptide prediction.
- Purpose: To enable the
-
UMP1080 Benchmark Dataset (for Fine-tuning and Evaluation):
- Purpose: To train and test the umami peptide prediction model specifically for classifying
umami,bitter, andneithertastes, and forumami thresholdregression. - Sources: Experimentally verified
umami peptidesfrom Web of Science, TastepeptidesDB, and BIOPEP-UWM;bitter peptidesfrom TastepeptidesDB, BIOPEP-UWM, and prior research;neither umami nor bitter peptidesrandomly selected from the pre-trained dataset. - Scale: A
balanced datasetcomprising 360umami peptides, 360bitter peptides, and 360peptides that are neither umami nor bitter, totaling 1080 peptides. - Split:
- Training Set: 900 peptides (300 umami, 300 bitter, 300 neither).
- Test Set: 180 peptides (60 umami, 60 bitter, 60 neither).
- Domain: These datasets are specific to peptide taste characteristics, providing concrete examples of peptides associated with
umamiandbittertastes, which are essential for validating the model's performance in this domain.
- Purpose: To train and test the umami peptide prediction model specifically for classifying
5.2. Evaluation Metrics
The model's performance was evaluated using standard classification and regression metrics.
5.2.1. Classification Metrics
-
Accuracy (ACC):
- Conceptual Definition: Measures the proportion of correctly predicted instances (both true positives and true negatives) out of the total number of instances. It indicates the overall correctness of the model's predictions.
- Mathematical Formula:
- Symbol Explanation:
- :
True Positives, instances correctly predicted as positive. - :
True Negatives, instances correctly predicted as negative. - :
False Positives, instances incorrectly predicted as positive (Type I error). - :
False Negatives, instances incorrectly predicted as negative (Type II error).
- :
-
Precision:
- Conceptual Definition: Measures the proportion of correctly predicted positive instances out of all instances predicted as positive. It indicates the model's ability to avoid false positives.
- Mathematical Formula:
- Symbol Explanation:
- :
True Positives. - :
False Positives.
- :
-
Recall:
- Conceptual Definition: Measures the proportion of correctly predicted positive instances out of all actual positive instances. It indicates the model's ability to find all positive instances (sensitivity).
- Mathematical Formula:
- Symbol Explanation:
- :
True Positives. - :
False Negatives.
- :
-
F1 score:
- Conceptual Definition: The harmonic mean of
PrecisionandRecall. It provides a single score that balances bothPrecisionandRecall, particularly useful when there is an uneven class distribution. - Mathematical Formula:
- Symbol Explanation:
- : The
Precisionvalue. - : The
Recallvalue.
- : The
- Conceptual Definition: The harmonic mean of
5.2.2. Regression Metrics
-
R-squared ():
- Conceptual Definition: Measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It indicates how well the model's predictions fit the observed data. A value closer to 1 indicates a better fit.
- Mathematical Formula:
- Symbol Explanation:
- : The total number of samples.
- : The
true valueof the dependent variable for the -th sample. - : The
predicted valueof the dependent variable for the -th sample. - : The
meanof the true dependent variable values.
-
Mean Absolute Error (MAE):
- Conceptual Definition: Measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average of the absolute differences between prediction and actual observation.
- Mathematical Formula:
- Symbol Explanation:
- : The number of samples.
- : The
true valuefor the -th sample. - : The
predicted valuefor the -th sample. - : The
absolute valueoperator.
-
Mean Squared Error (MSE):
- Conceptual Definition: Measures the average of the squares of the errors. It gives a relatively high weight to large errors, meaning it is most useful when large errors are particularly undesirable.
- Mathematical Formula:
- Symbol Explanation:
- : The number of samples.
- : The
true valuefor the -th sample. - : The
predicted valuefor the -th sample.
-
Root Mean Squared Error (RMSE):
- Conceptual Definition: The square root of the
MSE. It has the advantage of being in the same units as the target variable, making it easier to interpret than MSE. - Mathematical Formula:
- Symbol Explanation:
- : The number of samples.
- : The
true valuefor the -th sample. - : The
predicted valuefor the -th sample. - : The
square rootoperator.
- Conceptual Definition: The square root of the
5.3. Baselines
The paper compared its proposed model against several state-of-the-art models in umami peptide prediction, representing different algorithmic approaches and feature engineering strategies:
-
UMPred-FRL (Charoenkwan et al., 2021): A model using
feature representation learning. -
Umami-YYDS (Cui et al., 2023): A model based on
gradient boosting decision treesandmolecular descriptors. -
Umami-MRNN (Qi et al., 2023): A model combining
multi-layer perceptronandrecurrent neural network. -
LSTM (Long Short-Term Memory): A type of
recurrent neural networkcommonly used for sequence data. -
IUP-BERT (Jiang et al., 2022): A
BERT-based modelspecifically forumami peptideprediction. -
Jiang's method (Jiang et al., 2023): A
machine learning methodusingmultiplicative LSTM embedded features.These baselines are representative because they cover a range of
machine learninganddeep learningtechniques applied to the problem ofumami peptideprediction, includingtraditional ML,RNNs, and earlierTransformer-basedapproaches. This allows for a comprehensive comparison to demonstrate the advancements made by the proposed integrated model.
6. Results & Analysis
6.1. Amino Acid Composition and Distribution Analysis
The study began by analyzing the amino acid composition and distribution patterns in umami and bitter peptide datasets, as taste characteristics are closely related to these features.
The following figure (Figure 1 from the original paper) shows the frequency of individual amino acids in different segments of bitter peptides:
该图像是多种氨基酸肽及其特征的可视化示意图。图中展示了不同肽段的味觉表现及其在生物活性方面的比较,利用极坐标图展示了氨基酸的组合特征和相互作用。此外,图中还用泡沫图表示了不同肽的效果和特征,突出氨基酸对鲜味的重要性。
- Peptide Length Distribution (Fig. 1A): Both
umamiandbitter peptidesprimarily had lengths below 10 amino acids, with 86.7% of umami peptides and 85.8% of bitter peptides falling within this range. This indicates that shorter peptides are highly relevant for taste perception in both categories. - Amino Acid Frequency in Umami Peptides (Fig. 1B):
Glutamic acid (E),aspartic acid (D),leucine (L),alanine (A),glycine (G), andlysine (K)were significantly more frequent inumami peptides. This aligns with previous findings that E and D are keyumami amino acids, and A and G can enhance umami taste synergistically with D or E. - Amino Acid Frequency in Bitter Peptides (Fig. 1C):
Hydrophobic amino acids, particularlyproline (P)andphenylalanine (F), were highly prevalent inbitter peptides. This is consistent with the general understanding thathydrophobicityis often associated with bitter taste. - N- and C-terminal Frequencies (Fig. 1D):
- Umami Peptides: and were enriched at the
N-terminal, while and were relatively high at theC-terminal. residues at the C-terminal have been linked to enhanced umami expression. - Bitter Peptides: , , and were primary at the
N-terminal, and and dominated at theC-terminal.
- Umami Peptides: and were enriched at the
- Terminal Frequencies by Peptide Length:
- Short Umami Peptides (2-3 amino acids, Fig. 1E): and were significantly more frequent at both
N-andC-terminals, with higher occurrence at theN-terminal. This reinforces their role inumami perception. For 4-7 amino acid peptides, , , and were higher at theN-terminal, while and were more frequent at theC-terminal. For 8-10 amino acid peptides, and were higher at theN-terminal, with and remaining predominant at theC-terminal. - Long Umami Peptides (>10 residues): The amino acid distribution was less clear, but and were high-frequency residues, particularly at the
C-terminus(57.3% combined). No or were found at the termini of these longer umami peptides. This suggests that for longer peptides, taste might be influenced by complex spatial conformations rather than individual amino acids. - Short Bitter Peptides (<10 amino acids, Fig. 1F):
Hydrophobic residueslike , , and were more frequent at theN-terminal, and also appeared. At theC-terminal, and predominated. This supports the observation that peptides with combinations of , , , , , , or are oftenbitter. and were absent at the termini of bitter peptides.
- Short Umami Peptides (2-3 amino acids, Fig. 1E): and were significantly more frequent at both
- Overall Amino Acid Composition by Length:
-
Short Umami Peptides (<10 amino acids, Fig. 1G): and were identified as key components influencing
umami characteristics. -
Long Umami Peptides (>10 amino acids, Fig. 1G): No clear pattern, supporting the idea of complex spatial conformations for longer peptides.
-
Short Bitter Peptides (<10 amino acids, Fig. 1H):
Hydrophobic residues, , and were predominant. -
Long Bitter Peptides (>10 amino acids, Fig. 1H): frequency was significantly higher than other amino acids.
In summary, this analysis provides a foundation for understanding the molecular characteristics of taste peptides, highlighting the importance of specific amino acids and their positions for
umamiandbittertastes, and setting the stage fordeep learning modeldevelopment andinterpretability.
-
6.2. Performance Comparisons of Different Sequence Encoding Methods
To determine the most effective way to represent peptide sequences for the model, a comparison of various sequence encoding methods was conducted using 5-fold cross-validation. The performance was assessed based on Accuracy (ACC), Precision, Recall, and F1 score.
The paper refers to supplementary data for a detailed table of these comparisons (Table S2). Although Table S2 is not provided in the main text, the key finding is stated:
- The
combination modelofBERTand the foursequence encodings(DistancePair,CKSAAGP,QSOrder,DDE), termedfeature fusion, achieved significant improvements. - Its performance metrics were: , , , and . These results underscore the superiority of integrating
BERT's contextual embeddingswithphysicochemicalandstructural features.
6.3. Performance Comparisons with State-of-the-Art Models
The proposed model's performance was rigorously evaluated against existing state-of-the-art models.
The following are the results from Table 1 of the original paper:
| Algorithms | Samples | ACC | Precision | Recall | F1 score |
|---|---|---|---|---|---|
| UMPred-FRL | 140 umami and 340 bitter | 0.860 | 0.786 | ||
| Umami-YYDS | 198 umami and 215 bitter | 0.896 | 0.913 | 0.875 | 0.894 |
| Umami-MRNN | 212 umami and 287 bitter | 0.915 | 0.879 | - | |
| LSTM | 140 umami and 304 bitter | 0.921 | − | 0.821 | |
| IUP-BERT | 140 umami and 302 bitter | 0.923 | 0.888 | ||
| Ours | 360 umami, 360 bitter, and 360 others | 0.93981 | 0.94366 | 0.93056 | 0.93706 |
-
Classification Performance (Table 1): The proposed model achieved the highest
ACCof 0.93981, outperforming other models by 2 to 9 percentage points. It also demonstrated superiorPrecision(0.94366),Recall(0.93056), andF1 score(0.93706). This indicates the model's strong ability to correctly identify umami peptides and distinguish them from non-umami peptides. The comparison withIUP-BERT(0.923 ACC) highlights the incremental benefit of combiningBERTwithenhanced featuresandcontrastive learning.The following figure (Figure 3 from the original paper) shows the characteristics and relations of bitter and umami peptides:
该图像是多个图表的组合,展示了苦味和鲜味肽的特征及其关系。图A为三维散点图,区分了苦味(红色)和鲜味(绿色)肽;图B显示了预测值与实际值的关系;图C和D为三维表面图及小提琴图,分别呈现不同氨基酸的效果;图E和F为氨基酸相互作用的热图,数字表示相关性强度。 -
Feature Space Visualization (Fig. 3A): A
Uniform Manifold Approximation and Projection (UMAP)method was used to visualize the feature space. The distinct separation betweenumami(green) andbitter(red) peptides in the 3D scatter plot (Fig. 3A) visually confirms the model's high accuracy in classification and its ability to learn discriminative features. -
Umami Threshold Prediction (Fig. 3B): For
regression tasks(predicting umami thresholds), the model showed excellent performance. The plot of actual vs. predicted values for umami peptide thresholds exhibited a strong correlation with anR² valueof 0.98. -
Regression Error Metrics: The
MSE,RMSE, andMAEwere calculated as 0.0013, 0.036, and 0.031, respectively. These values are lower than those reported in comparable studies (e.g., Guo et al., 2023, with , MSE = 0.103, RMSE = 0.321, MAE = 0.235 for astringency threshold prediction), indicating superior predictive performance and robustness. -
Umami Threshold Feature Space (Fig. 3C):
UMAPwas again used to visualize the feature vectors, clustering umami threshold data into three categories. The strong correlation between these features and the umami thresholds in 3D space further validates the model's predictive capability.The superior performance is attributed to:
- Pre-trained BERT: Effectively captures rich contextual feature representations from large-scale
bioactive peptidedatasets, facilitatingtransfer learning. - Multi-feature Fusion: Integrates
peptide sequence information,amino acid composition,physicochemical properties,structural features, andevolutionary information, providing multi-dimensional input. - Contrastive Learning: Enables the model to learn subtle yet critical differences by comparing similar and dissimilar peptide instances, enhancing
discriminative power.
6.4. Model Interpretation
Understanding how the model makes predictions (interpretability) is crucial for rational design.
- Amino Acid Importance (Fig. 3D): Analysis of
attention values(how much the model "focuses" on each amino acid) revealed thatamino acid residues D(aspartic acid) and (glutamic acid) had higher attention values, indicating their significant role in the model's accurate prediction ofumami peptides. This reinforces their known importance from amino acid frequency analysis. Interestingly, (glutamine), (methionine), (serine), (proline), and (histidine) also showed high attention values, suggesting that their position within the peptide chain or their interaction context might be important, even if their overall frequency is not the highest. - Dipeptide Pair Scores (Figs. 3E and 3F):
Amino acid pair scores(product of averageattention valueandfrequency) were calculated to identifyhigh-contribution dipeptide fragments.-
Umami Peptides (Fig. 3E):
EE(glutamic acid-glutamic acid),DE(aspartic acid-glutamic acid),EK(glutamic acid-lysine),EL(glutamic acid-leucine), andEA(glutamic acid-alanine) exhibited high scores (1.496, 1.042, 0.892, 0.845, and 0.797, respectively). These are identified as potential key determinants ofumami characteristics. -
Bitter Peptides (Fig. 3F):
PF(proline-phenylalanine),FP(phenylalanine-proline),GP(glycine-proline),PP(proline-proline), andPG(proline-glycine) showed high scores (3.603, 2.370, 2.152, 2.146, and 1.570, respectively). These are indicative ofbitterness-contributing modules.This interpretability provides direct guidance for the
module substitution strategy.
-
6.5. Identification of Umami Peptides
The developed deep learning model was applied to screen peptides derived from Tenebrio molitor protein.
- Virtual Hydrolysis:
In silicohydrolysis of Tenebrio molitor protein (usingpepsinandtrypsin) generated 1469 peptides. - Pre-screening:
- All 1469 peptides were predicted to be
non-toxic. - 1316 peptides demonstrated
good water solubility.
- All 1469 peptides were predicted to be
- Taste Prediction: The model predicted the taste characteristics and thresholds for these 1469 peptides:
- 1237 were predicted as
umami peptides. - 202 were predicted as
bitter peptides. - 30 were predicted as
other types of peptides. These findings confirm Tenebrio molitor as a rich source forumami peptidediscovery.
- 1237 were predicted as
6.6. Taste Characteristics of Synthetic Peptides
Ten previously unreported peptides, selected based on model predictions and covering dipeptides to decapeptides and longer, were synthesized and subjected to sensory evaluation.
The paper refers to supplementary data for a detailed table of taste properties (Table S3). Although Table S3 is not provided in the main text, the key findings are stated:
-
Validation: The actual taste perception of the synthesized peptides (
EN,ETR,GK4,RK5,ER6,EF7,IL8,VR9,DL10, andPK14) was highly consistent with the model's predictions. All primarily exhibitedumami taste. -
Detection Thresholds: The detection thresholds ranged from 0.02446 to . These values are significantly lower than the threshold of
MSG(), highlighting their potentumamicharacteristics.-
The
thresholdforECQVEGFwas not measured due to the presence ofsulfur-containing amino acids(C, cysteine) which produce a pungent odor that interferes with threshold determination.The following figure (Figure 4 from the original paper) shows the probability of correct selection versus log-concentration for various amino acid peptides:
该图像是一个示意图,展示了不同氨基酸肽(如EN、ETR、GVVK等)在各种浓度下的正确选择概率与对数浓度的关系。每个子图包括了氨基酸结构以及拟合曲线,表现出不同的阈值和相关系数。
-
-
Figure 4 visually presents the
detection thresholdsfor the individual synthesized peptides. Thesigmoid curvesshow the probability of correct selection by assessors as a function oflog-concentration. The point at which the probability reaches 50% (the threshold) is clearly identifiable for each peptide. -
Other Basic Tastes: Besides
umami, some peptides also exhibited other basic tastes likesweetness,sourness, andastringency. Thesynergistic interactionbetweensweetnessandumamiis noted as a potential enhancer.Sournessandastringencymight be artifacts ofsolvent residuesfrom synthesis. -
Amino Acid Composition Consistency: These validated umami peptides frequently contained , , , and and often had or at the C-terminal, consistent with the
interpretability resultsof the deep learning model.The following figure (Figure 5 from the original paper) shows a series of curve plots for the probability of correct selection for different peptides in combination with MSG:
该图像是多条曲线图,展示了不同肽与MSG共同作用下的判断概率。这些曲线提供了每种肽与MSG浓度关系的理论拟合和实验数据,显示了不同肽的阈值和R²值,揭示了其对鲜味的影响。 -
Interaction with MSG (Fig. 5): The interaction between nine of the synthesized peptides and
MSGwas investigated using theR-value.- The
R valuesforEN,ETR,GK4,RK5,ER6,IL8,VR9,DL10, andPK14were 0.69, 0.84, 0.91, 0.80, 0.72, 0.77, 0.83, 0.74, and 0.72, respectively. - Since all R values are between 0.5 and 1, this indicates an
additive effectwhen combined with MSG. This implies that these peptides canenhance umamiand potentially reduce the reliance on MSG, contributing tosodium reductionstrategies.
- The
6.7. Active Site Analysis of Umami Peptides
Quantum chemical calculations, specifically frontier molecular orbital (FMO) analysis, were performed to understand the active sites and taste mechanisms of the umami peptides.
The paper refers to supplementary data for a table of HOMO-LUMO energy gaps (Table S4). Although Table S4 is not provided in the main text, the key finding is stated:
- HOMO-LUMO Energy Gap: The
energy gapbetweenHOMOandLUMOreflects a molecule'schemical reactivity. A smaller gap generally indicates higher reactivity and a greater propensity to interact with taste receptors.-
RPIEKexhibited the lowestHOMO-LUMO energy gap(), which correlates with its lowerumami threshold(higher potency).The following figure (Figure 6 from the original paper) shows the LUMO and HOMO states of different peptides (EN, ETR, GVVK, RPIEK, EDAQDR):
该图像是分子轨道图,展示了不同肽(EN, ETR, GVVK, RPIEK, EDAQDR)的LUMO和HOMO态。在图中,各分子的电子分布由不同颜色的球体表示,用于理解其在味觉潜能中的电子特性。
-
The following figure (Figure 7 from the original paper) shows the relationship between different peptide chains and their corresponding LUMO and HOMO orbitals (ECQVEGF, IKPTVVEL, VLGHELPER, DDDGQPIPEL, PEIEAQPIEEQK):
该图像是示意图,展示了不同肽链与其对应的LUMO和HOMO轨道的关系。图中显示了五个肽链(ECQVEGF、IKPTVVEL、VLGHELPER、DDDGQPIPEL、PEIEAQPIEEQK)的分子结构以及相应的电子云分布。通过对比可以观察到特定的电子特性。
- HOMO/LUMO Orbital Analysis (Figs. 6 and 7): The
active sitesofumami peptideswere primarily distributed on amino acid residues , , , , and .- Significantly, when or was present as the
C-terminal residue, their occurrence as active sites was higher. This supports previous research linkingC-terminal Kor to enhancedumami taste expression, and confirms and as keyumami amino acids.
- Significantly, when or was present as the
- Consistency with Deep Learning Interpretability: The results from
quantum chemical simulationsalign with theinterpretability outcomesof thedeep learning model, providing a strong validation of the model's insights into thetaste-presenting mechanisms.
6.8. Umami Evaluation of Design and Modification of Peptides
The module substitution strategy was employed to demonstrate its effectiveness in precise peptide design and modification.
- Strategy: The
EE dipeptide module, identified as highly umami-active by the model, was used to replacehigh-contribution bitterness modules(PF,FP,GP,PP,PG) inbitter peptides. - Application: Out of the
Tenebrio molitorproteins, 27non-umami peptidescontained these bitter modules, with 20 having good water solubility and all being non-toxic. - Results of Substitution: After replacing the bitter modules with
EE, all resulting peptides were predicted to be converted intoumami peptides, maintaining goodwater solubilityandnon-toxicity. - Previous Studies Alignment: This finding is consistent with prior research on
fragment substitutionto enhance peptide activities (e.g.,XOD inhibitory peptidemodifications,ACE inhibitory peptidemodifications by Mirzaei et al., 2019; Zhao et al., 2023; Meng et al., 2024). Specifically, Meng et al. (2024) showed that replacinglow-contribution GPwithhigh-contribution KEorKNenhancedXOD inhibitory activity. - Significance: This demonstrates that
modular substitutionis a feasible and effective strategy for improvingpeptide flavor, enablingprecise peptide design and modification. The modified umami peptide sequences can then inform the selection ofenzymesandhydrolysis conditionsfor targeted preparation, or guide the choice ofprotein sources.
6.9. Mechanism of Module Substitution Altering Peptide Taste Characteristics
To elucidate the molecular mechanism behind the module substitution strategy, molecular docking experiments were conducted. Due to computational cost, peptides shorter than 10 amino acids were chosen for this analysis.
The following figure (Figure 8 from the original paper) shows the interaction between peptides and the umami receptors T1R1/T1R3 before and after module substitution:
该图像是示意图,展示了多个氨基酸序列及其对应的分子结构,标注了关键的氢键和疏水相互作用。每个结构代表一种特定的味道肽,揭示了其与T1R1/T1R3的相互关系。
The following figure (Figure 7, continued from the original paper) shows the molecular structure diagram of DQTEEIQR:
该图像是分子结构示意图,展示了三种肽链TPPSEEIN、DQTPGIPQR和DQTEEIQR的氢键和疏水相互作用。每个氨基酸通过不同的颜色和符号标示,强调了关键氨基酸在味道呈现中的重要性。
- Molecular Docking Results (Fig. 8 and Fig. S1 (not provided)): The results revealed that the
peptides after module substitutionformedmore hydrogen bondsandhydrophobic interactionswith theumami receptors T1R1/T1R3compared to theirunmodified counterparts. - Interaction Sites:
- Modified Peptides: Primarily interacted with
Arg151,Asp147,Arg277,His71,Ser146, andAla302on . - Unmodified Peptides: Mainly interacted with
Asp147,Ala302, andHis71.
- Modified Peptides: Primarily interacted with
- Key Interaction Forces and Residues: This finding is consistent with previous studies demonstrating that
hydrogen bondsandhydrophobic interactionsare crucial for the binding ofumami peptidesto . The residuesArg151,Asp147,Gln52,Glu277,Arg277,His71,Ser146, andAla302have been identified as critical for these interactions. - Conclusion: The increased number and strength of interactions, particularly with key residues in the
T1R1/T1R3 binding pocket, explain howmodule substitutionenablesmodified peptidesto effectively bind to the receptor and elicit anumami taste. This confirms that altering peptide sequence composition through module substitution directly influences taste characteristics by modulating receptor binding.
7. Conclusion & Reflections
7.1. Conclusion Summary
This study successfully developed and validated a powerful computational framework for the rapid screening and rational design of umami peptides. The proposed deep learning model, integrating pre-training, enhanced features, and contrastive learning, achieved an impressive accuracy of 0.93981, significantly outperforming existing models. Through virtual hydrolysis of Tenebrio molitor protein and sensory evaluation, ten novel umami peptides were identified, demonstrating potent umami taste and additive umami-enhancing effects with MSG, with detection thresholds lower than MSG. Crucially, the research introduced a module substitution strategy that enabled the successful conversion of bitter peptides into umami peptides. Interpretability analyses (attention values, HOMO/LUMO) identified key amino acid residues (D, E, Q, K, R) for umami taste, and molecular docking elucidated the mechanism: module substitution enhances hydrogen bonding and hydrophobic interactions with the T1R1/T1R3 receptor. This comprehensive approach provides an efficient tool for discovery and design, significantly expanding the umami peptide repository and deepening the understanding of umami taste presentation mechanisms.
7.2. Limitations & Future Work
The authors implicitly highlight some limitations and suggest future directions:
- Peptide Length and Conformational Complexity: The amino acid composition analysis showed that for longer peptides (>10 amino acids), taste characteristics are not primarily determined by individual amino acids but by
complex spatial conformations. This suggests that the current model, primarily sequence-based, might have limitations in fully capturing the taste of very long peptides. Future work could involve incorporating more sophisticatedstructural predictionormolecular dynamics simulationsfor longer sequences. - Other Taste Attributes: While the model successfully classifies umami, bitter, and neither, the sensory evaluation noted that umami peptides can also exhibit
sweetness,sourness, orastringency. The current model focuses solely on umami and bitter. Future work could expand the model to predict a wider range of taste attributes and their interactions. - Synthesis and Preparation Costs: While
module substitutionis a powerful design tool, the paper acknowledges thatpeptide synthesiscan still be costly. Future work could focus on usingbioinformatics analysisto map designed peptides back toprotein sequences, allowing for the selection of suitableenzymesandoptimized hydrolysis conditionsfor cost-effectivepreparation from natural protein sources. This could also guide the selection of appropriateprotein sourcesfor targeted peptide production. - Unverified Assumptions: The
module substitution strategyis validated using predicted taste profiles. While sensory evaluation confirmed the initial umami peptides, comprehensive experimental validation for all module-substituted peptides is still needed.
7.3. Personal Insights & Critique
This paper presents a highly innovative and comprehensive approach to umami peptide discovery and design. The integration of advanced deep learning (BERT, enhanced features, contrastive learning) with interpretable analysis and a practical module substitution strategy is particularly compelling. It represents a significant step forward from purely predictive models to truly rational peptide engineering.
Key strengths:
- Holistic Approach: It covers prediction, design, and mechanistic elucidation, providing a full pipeline from
in silicoscreening to understandingmolecular interactions. - State-of-the-Art Performance: The model's accuracy and low error rates are impressive, demonstrating the power of the combined
feature engineeringandlearning strategies. - Interpretability: The use of
attention value analysisandquantum chemical simulationsto identify criticalamino acidsandactive sitesadds significant scientific value, moving beyondblack-box models. This interpretability is crucial for building trust inAI-driven design. - Practical Application: The
module substitution strategyis a tangible method for modifying peptides to achieve desired taste profiles, which has direct applications infood scienceandproduct development. The identification of umami peptides from Tenebrio molitor is also a valuable practical outcome given the interest in alternative protein sources.
Potential areas for improvement or further research:
-
Generalizability of Module Substitution: While the
EE modulesuccessfully converted bitter to umami, further research could explore a wider library ofumami modulesand their effectiveness in different peptide contexts. The specific "bitter modules" targeted might also vary depending on thebitter peptide's structure. -
Beyond Dipeptides: The
module substitutionfocused ondipeptide fragments. Investigating largerfunctional modules(e.g.,tripeptidesor longer motifs) could yield even more precise and potent modifications, though this would increase complexity. -
Computational Cost:
Quantum chemical simulationsandmolecular dockingfor mechanistic studies can be computationally intensive, especially forlonger peptides. Developing more efficient computational methods orAI-accelerated simulationscould enhance the practicality of these analyses for high-throughput design. -
Multi-Taste Design: While the model can distinguish
umamiandbitter, designing peptides with a desired combination of tastes (e.g., umami with a hint of sweetness, or masking unwanted bitterness while retaining umami) remains a complex challenge. Future models could aim formulti-label predictionacross all basic tastes. -
Experimental Validation Scope: While 10 peptides were validated, confirming the taste of all
module-substituted peptidesexperimentally would be the ultimate proof of concept for thedesign strategy.The methods and conclusions can certainly be transferred to other domains of
bioactive peptideresearch. For instance, similar frameworks could be used to designantihypertensive peptides,antioxidant peptides, orantimicrobial peptidesby identifying keyfunctional modulesand substituting them into inactive sequences. Theinterpretability frameworkis particularly valuable for accelerating understanding in these areas where experimental characterization is slow and expensive. This paper provides a strong blueprint forAI-driven rational designinpeptide science.
Similar papers
Recommended via semantic vector search.