FoldamerDB: a database of peptidic foldamers
TL;DR Summary
FoldamerDB is an open-source, fully annotated database of peptidic foldamers, containing information on 1319 species and their biological activities, collected from over 160 papers. The user-friendly interface allows for comprehensive searching and filtering, addressing a gap in
Abstract
Foldamers are non-natural oligomers that mimic the structural behaviour of natural peptides, proteins and nucleotides by folding into a well-defined 3D conformation in solution. Since their first description about two decades ago, numerous studies have been undertaken dealing with the design, synthesis, characterization and application of foldamers. They have huge application potential as antimicrobial, anticancer and anti-HIV agents and in materials science. Despite their importance, there is no publicly available web resource providing comprehensive information on these compounds. Here we describe FoldamerDB, an open-source, fully annotated and manually curated database of peptidic foldamers. FoldamerDB holds the information about the sequence, structure and biological activities of the foldamer entries. It contains the information on over 1319 species and 1018 activities, collected from more than 160 research papers. The web-interface is designed to be clutter-free, user-friendly and it is compatible with devices of different screen sizes. The interface allows the user to search the database, browse and filter the foldamers using multiple criteria. It also offers a detailed help page to assist new users. FoldamerDB is hoped to bridge the gap in the freely available web-based resources on foldamers and will be of interest to diverse groups of scientists from chemists to biologists. The database can be accessed at http://foldamerdb.ttk.hu/.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of the paper is "FoldamerDB: a database of peptidic foldamers."
1.2. Authors
The authors are Bilal Nizami, Dorottya Bereczki-Szakál, Nikolett Varró, Kamal el Battioui, Vignesh U. Nagaraj, Imola Cs. Szigyártó, István Mándity, and Tamás Beke-Somfai. Their affiliations include the MTA TTK Lendület Biomolecular Self-Assembly Research Group, Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Hungarian Academy of Sciences, located in Budapest, Hungary.
1.3. Journal/Conference
The paper was published in Nucleic Acids Research (NAR). NAR is a highly reputable peer-reviewed scientific journal in the fields of molecular biology, biochemistry, and bioinformatics. It is particularly well-known for its annual database issue, which features comprehensive descriptions of biological databases, making it an influential venue for this type of research.
1.4. Publication Year
The paper was published on 2019-10-17T00:00:00.000Z.
1.5. Abstract
The paper introduces FoldamerDB, an open-source, fully annotated, and manually curated database dedicated to peptidic foldamers. Foldamers are synthetic oligomers designed to mimic the structural properties of natural biopolymers by folding into well-defined 3D conformations. Despite their growing importance and potential in various applications (e.g., antimicrobial, anticancer agents, materials science), the authors highlight a significant gap: the absence of a publicly available, comprehensive web resource for these compounds.
FoldamerDB addresses this gap by providing detailed information on the sequence, structure, and biological activities of foldamer entries. It currently contains information on over 1319 species and 1018 activities, meticulously gathered from more than 160 research papers. The database features a user-friendly, clutter-free, and responsive web interface that allows users to search, browse, and filter foldamers using multiple criteria. A detailed help page is also available to assist new users. The authors anticipate that FoldamerDB will be a valuable resource for a diverse scientific community, ranging from chemists to biologists, facilitating research and design in the field of foldamers.
1.6. Original Source Link
The original source link is /files/papers/69120bd4b150195a0db74a26/paper.pdf. It is an officially published paper.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve is the lack of a centralized, comprehensive, and publicly accessible database for foldamers, specifically peptidic foldamers. Foldamers are an important class of non-natural oligomers that can fold into defined 3D structures, mimicking natural peptides, proteins, and nucleotides. They have demonstrated significant potential in diverse applications, including antimicrobial, anticancer, and anti-HIV agents, as well as in materials science.
Despite their importance and widespread research, information about foldamers—their sequences, structures, and biological activities—is scattered across numerous scientific publications. This fragmented knowledge base makes it challenging for researchers, particularly those involved in computer-aided design, machine learning, molecular graphics, protein structure prediction, and drug design, to efficiently access and utilize this data. The growing sophistication of computational methods in these fields necessitates a dedicated and focused data resource.
The paper's entry point is to create FoldamerDB, the first open-source and comprehensive database for peptidic foldamers, thereby centralizing this critical information and facilitating advanced research and design.
2.2. Main Contributions / Findings
The primary contributions and findings of the paper are:
- Establishment of FoldamerDB: The creation and public release of
FoldamerDB, the first open-source, fully annotated, and manually curated database specifically forpeptidic foldamers. This fills a critical gap in publicly available web resources for this important class of synthetic macromolecules. - Comprehensive Data Collection: The database currently hosts information on over 1319
peptidic foldamerspecies and 1018 associated biological activities, meticulously gathered from more than 160 research papers. This extensive collection provides a rich data source for the scientific community. - Detailed Foldamer Information: Each entry in
FoldamerDBprovides comprehensive details, including 2D and 3D models, molecular properties (e.g.,LogP,H-bond donors/acceptors,rotatable bonds,polar surface area), compound identifiers (e.g.,SMILES,InchiKey), structural information (e.g., method of analysis, links toCSDandPDB), external database IDs (Reaxys ID,NCBI accession number), biological activities, and bibliographic references. - User-Friendly Web Interface: The database features an intuitive, clutter-free, and responsive web interface compatible with various screen sizes. It offers multiple search options (simple, complex,
substructure searchusingTanimoto coefficientfor similarity), browsing capabilities (byfoldamertype, article, structure, or activity), and a detailedsingle foldamer view. - Community Engagement:
FoldamerDBincludes features for user feedback and encourages contributions of new data from the scientific community, aiming for continuous expansion and maintenance. - Enabling Future Research: The database is expected to serve as a foundational tool for novel design projects, particularly those involving
machine learningtechniques, by providing easily accessible structural, chemical, and biological information onfoldamers.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand the FoldamerDB paper, it's helpful to be familiar with the following concepts:
-
Foldamers: These are non-natural
oligomers(molecules made of repeating units) that are designed to mimic the structural behavior of naturalbiopolymerslikepeptides,proteins, andnucleotides. Unlike their natural counterparts,foldamersare typically synthesized from non-natural building blocks. Their defining characteristic is their ability to fold into a well-defined, stable 3D conformation in solution, which is crucial for their function. This folding behavior is often stabilized byhydrogen bondsor other intramolecular interactions. -
Peptidic Foldamers: A specific and important class of
foldamerswhere the building blocks areamino acidanalogs orpeptidomimetics. They are designed to mimicpeptidestructures. Examples mentioned in the paper include:α-peptide: Composed of naturalalpha-amino acids.β-peptide: Composed ofbeta-amino acids, which have an extra carbon atom in the backbone compared toalpha-amino acids. They can form stable secondary structures like helices.γ-peptide: Composed ofgamma-amino acids, having two extra carbon atoms in the backbone.α/β-peptide,α/γ-peptide,β/γ-peptide,α/β/γ-peptide: Hybridfoldamersincorporating a mix of differentamino acidtypes in their backbone.Peptoids:N-substituted glycineswhere the side chain is attached to the backbone nitrogen atom rather than the alpha-carbon. This modification often increases protease resistance.Aib foldamer: Contains residues, which are non-proteogenicalpha-amino acidsknown for inducing helical conformations due to their steric bulk.
-
Relational Database Management System (RDBMS): A software system used to create and manage
relational databases. In anRDBMS, data is organized into tables (relations), which are linked to each other by common fields (keys), establishingparent-child relationships. This structure allows for efficient storage, retrieval, and management of large amounts of structured data.MySQLis a popular open-sourceRDBMS. -
Molecular Properties:
LogP: A measure of a compound'slipophilicity(fat-solubility) orhydrophobicity(water-repelling nature). It is the logarithm of the partition coefficient (P) of a compound between two immiscible solvents, typically octanol and water. HigherLogPvalues indicate greaterlipophilicity.H-bond donors and acceptors: Refers to the number of atoms in a molecule that can formhydrogen bonds.Hydrogen bond donorstypically contain hydrogen atoms bonded to highly electronegative atoms (like oxygen or nitrogen), whilehydrogen bond acceptorstypically contain highly electronegative atoms with lone pairs of electrons. These are important for molecular interactions and solubility.Rotatable bonds: Covalent bonds that allow free rotation of the groups attached to them. The number ofrotatable bondsis an indicator of a molecule's flexibility, which can influence its binding affinity to targets and its ability to adopt different conformations.Polar Surface Area (PSA): The sum of the surface areas of all polar atoms (typically oxygen, nitrogen, and attached hydrogen atoms) in a molecule. It's often used to predictdrug absorption,blood-brain barrier penetration, andcell permeability, as highly polar molecules tend to have poorer membrane penetration.
-
Chemical Fingerprints (
FP2): A computational representation of a molecule's structure using a fixed-length binary string (a series of 0s and 1s). Each bit in thefingerprintcorresponds to the presence or absence of a specific structural feature orsubstructurewithin the molecule.FP2refers to a specific type offingerprintoften used inchemoinformaticsfor similarity searching andsubstructure searching. -
Tanimoto Coefficient (or Jaccard Index): A commonly used metric to quantify the similarity between two sets, or in
chemoinformatics, between twochemical fingerprints. It is calculated as the ratio of the number of common features (intersection) to the total number of features (union) present in both molecules'fingerprints. A value of 1 indicates identical molecules, while 0 indicates no common features. $ J ( A , B ) = { \frac { | A \cap B | } { | A \cup B | } } $ where:J(A, B)is theTanimoto coefficientbetweenfingerprintsets and .- is the number of bits set to 1 in both
fingerprintsand (the intersection). - is the number of bits set to 1 in either
fingerprintor (the union). - and are the
fingerprintsets of the query and hit molecules, respectively.
-
Web Technologies:
Apache HTTP server: A popular open-source web server software.MySQL: An open-source relational database management system.PHP: A server-side scripting language designed for web development.HTML5,CSS,JavaScript: Core technologies for building web pages (structure, styling, interactivity).Bootstrap3: A popularfront-endframework for developing responsive and mobile-first websites.jQuery: A fast, small, and feature-richJavaScript librarythat simplifiesHTMLdocument traversalandmanipulation,event handling,animation, andAjax.Jmol: An open-sourceJava viewerfor chemical structures in 3D.
3.2. Previous Works
The paper explicitly states that, despite the importance and high potential of foldamers, "there is no publicly available web resource providing comprehensive information on these compounds." This highlights the primary gap that FoldamerDB aims to fill.
However, the authors mention several existing databases that were used as sources for data collection and cross-referencing, indicating their general relevance to chemical and biological information:
-
Reaxys: A chemical database system by Elsevier, providing extensive information on chemical reactions, compounds, and bibliographic data. It was used to extract chemical and other information and for cross-referencing.
-
PubChem (51): A public repository for chemical information, including structures, identifiers, properties, and biological activities of small molecules. Used for extracting chemical and other information.
-
ChEMBL (52): A large-scale bioactivity database curated by the European Bioinformatics Institute (EBI), containing information on compounds with drug-like properties and their biological activities. Used for extracting chemical and other information.
-
NCBI databases: A suite of databases from the National Center for Biotechnology Information, including sequence databases, protein databases, and literature databases like PubMed and PMC. Used for extracting chemical and other information.
-
CSD (Cambridge Structural Database): A comprehensive collection of experimentally determined small-molecule organic and metal-organic crystal structures. Used to extract experimental structural information.
-
PDB (Protein Data Bank): A worldwide repository for the 3D structural data of large biological molecules, such as proteins and nucleic acids, determined by experimental methods. Used to extract experimental structural information.
The paper also cites examples of specialized databases for
antimicrobial peptides(AMPs) to illustrate the growing need for focused resources in related fields: -
CAMP (Collection of sequences and structures of antimicrobial peptides) (45): An early database for
antimicrobial peptides. -
CAMPR3 (46): A later version or related database focusing on
antimicrobial peptidessequences, structures, and signatures.These examples reinforce the argument that specialized databases are crucial for advancing research, especially when sophisticated algorithms and
machine learningare applied to specific types ofbiomolecules. The absence of such a resource forfoldamerswas a significant impediment thatFoldamerDBaddresses.
3.3. Technological Evolution
The field of foldamer chemistry, pioneered roughly two decades prior to this paper's publication by groups like Gellman and Seebach, has seen continuous growth in the design, synthesis, characterization, and application of these unique oligomers. This evolution has led to a vast body of literature describing numerous foldamer species, their diverse structures (e.g., various helices, sheets, turns, and higher-ordered assemblies), and their potential biological and materials science applications.
Concurrently, the broader field of chemoinformatics and bioinformatics has advanced significantly, with increasing reliance on computational tools for drug discovery, protein structure prediction, and materials design. This includes the development of sophisticated algorithms for molecular graphics, similarity searching, and machine learning.
The FoldamerDB paper fits within this technological timeline as a response to the maturation of foldamer chemistry and the increasing computational demands of chemical research. As the volume and complexity of foldamer data grew, the need for a structured, searchable, and easily accessible repository became critical. FoldamerDB represents a crucial step in the evolution of foldamer research, moving from purely experimental, paper-based data dissemination to a centralized digital resource that can enable higher-throughput computational analysis and design. It leverages modern web development technologies (PHP, MySQL, HTML5, CSS, JavaScript, Bootstrap) and chemoinformatics tools (JSME, Open Babel, Tanimoto coefficient) to achieve its goals.
3.4. Differentiation Analysis
Compared to the main methods and resources available in related work (primarily general chemical databases and specialized peptide databases), FoldamerDB offers several core differences and innovations:
-
Specialized Focus: The most significant differentiation is its exclusive and dedicated focus on
foldamers, particularlypeptidic foldamers. While general chemical databases likePubChemorReaxysmight contain somefoldamerentries, they do not provide the specialized curation, classification, and structural context specific tofoldamers(e.g., specificfoldamertypes, detailed structural analysis methods likeNMRorX-ray crystallographyfor folding confirmation). -
Comprehensive Curation for Foldamers:
FoldamerDBgoes beyond simply listing compounds. Each entry ismanually curatedandannotatedspecifically forfoldamercharacteristics. This includes confirming if a compound is described as afoldameror experimentally shown to fold into a specific 3D structure, which is a key criterion for inclusion. This level offoldamer-specific expert annotation is not available in broader databases. -
Integration of Structural and Biological Data: It integrates sequence, 2D/3D structure, molecular properties, and biological activity data in one place, specifically tailored for
foldamers. Links to external structural databases likeCSDandPDBare provided for deeper structural insights, a feature crucial for understandingfoldamerbehavior. -
Classification by Backbone Type:
FoldamerDBclassifiespeptidic foldamersinto distinct categories based on their backbone types (α-peptide,β-peptide,γ-peptide,peptoids,Aib foldamers, and various hybrids), which is essential forfoldamerchemists and provides a structured way to navigate this diverse class of molecules (as shown in Figure 1). -
User-Oriented Search and Browsing: The web interface is designed with
foldamerresearch in mind, offering specialized search options likesubstructure search(usingJSMEandTanimoto coefficient) and filtering byfoldamertype, publication year, and biological activity. This makes it much more efficient forfoldamerresearchers to find relevant information compared to generic chemical search engines. -
Open-Source and Freely Available: It is explicitly highlighted as an "open-source" and "freely available" resource, removing barriers to access for the global scientific community.
-
Enabling Computational Design: By providing structured and curated data,
FoldamerDBdirectly supports the application ofmachine learningand other computational design techniques forfoldamers, which is a growing area of research that cannot be effectively supported by unstructured literature or general databases.In essence, while general chemical databases provide breadth,
FoldamerDBprovides the necessary depth, specificity, andfoldamer-centric organization that was previously missing.
4. Methodology
4.1. Principles
The core principle behind FoldamerDB is to centralize and standardize the dispersed information regarding peptidic foldamers into a single, comprehensive, user-friendly, and open-source web-based database. This aims to bridge the gap in available resources for foldamer research, providing a foundation for design, synthesis, characterization, and application studies, especially those leveraging computational methods like machine learning. The database is built on the premise of manual curation and annotation to ensure data quality and relevance to the foldamer concept (i.e., compounds explicitly described as foldamers or experimentally confirmed to fold into a specific 3D structure).
4.2. Core Methodology In-depth (Layer by Layer)
The methodology of FoldamerDB involves several key stages, from data collection and processing to database design, implementation, and user interface development.
4.2.1. Data Collection and Processing
The process of populating FoldamerDB begins with identifying relevant foldamer compounds from scientific literature and then meticulously extracting and annotating their associated data. The workflow is schematically represented in Figure 2.

该图像是FoldamerDB的数据收集与处理工作流程示意图,展示了数据如何通过不同的数据源(如Reaxys、CSD、PDB和PubChem)进行整合,最终存储于FoldamerDB中,并通过Apache HTTP服务器提供网页接口。在数据库中,信息被分类为序列和活性。
Figure 2. Workflow of data collection and processing as well as information flow in FoldamerDB.
-
Literature Search:
- Keywords:
Keywordssuch as 'foldamer', 'non-natural peptide', 'peptide', and 'folding' are used to perform comprehensive searches in scientific databases likeSCOPUS,PubMed, andPMC. - Inclusion Criteria: A compound is included in
FoldamerDBonly if it is explicitly described as afoldamerin the literature or if experimental evidence demonstrates its ability to fold into a specific 3D structure. This ensures the relevance of the entries to the core definition of afoldamer.
- Keywords:
-
Data Extraction and Cross-referencing:
- For each identified
foldamer, detailed chemical and other relevant information is extracted. - External Databases: The literature entries are cross-referenced with various established external databases to gather additional data and validate existing information. These include:
Reaxys(https://www.reaxys.com/): For general chemical information.PubChem(51): A public repository for chemical structures and biological activities.ChEMBL(52): Abioactivity databasefor compounds with drug-like properties.NCBI databases: Forbiological sequencesand related information.CSD (Cambridge Structural Database): For experimental crystal structural information.PDB (Protein Data Bank): For 3D structural data of largerbiomolecules, including somefoldamers.
- Information Categories: The extracted data covers various aspects, including
2D and 3D models,molecular properties(e.g.,LogP, number ofH-bond donorsandacceptors,rotatable bonds,polar surface area),compound identifiers(e.g.,SMILES,InchiKey),structural information(e.g., method of analysis likeNMRorX-ray crystallography),external database IDs(Reaxys ID,NCBI accession number,CCDC number,PDB ID),biological activities,applications,type of foldamer, andbibliography.
- For each identified
-
Data Processing and Annotation:
- Software Tools:
Python3(https://www.python.org/) andKNIME Analytics Platform version 3.6.2(53) are utilized for processing and annotating the collected data.KNIMEis an open-source data analytics, reporting, and integration platform. - Structure Correction:
Marvin by ChemAxon(https://chemaxon.com/products/marvin) is employed to correct any erroneous chemical structures identified during the process, ensuring data accuracy. - Foldamer Classification: Each collected
foldameris classified according to itsbackbone typeinto one of the following predefined groups:α-peptide,β-peptide,γ-peptide,α/β-peptide,α/γ-peptide,α/β/γ-peptide,β/γ-peptide,Aib foldamer, orpeptoids. This classification is crucial for organizing and searching the database, as illustrated in Figure 1A. While the database primarily focuses onpeptidic foldamers, some naturalα-peptidesare also included, particularly if they served as starting sequences for modification with non-natural insertions, especially formachine learningpurposes. All entries are generally referred to asfoldamersfor simplicity. - Subtype Assignment: Wherever feasible, each
foldamerentry is also assigned asubtypebased on the specific chemical structure of its building blocks, as described in the original research articles.
- Software Tools:
4.2.2. Database Design and Implementation
FoldamerDB is designed as a relational database to ensure efficient data retrieval and scalability, accommodating a growing number of entries.
-
Backend Infrastructure:
- The database is hosted on an
Apache HTTP server 2.4, which serves the web pages and handlesHTTPrequests. - The data itself is stored in a
MySQL server 5.7instance, serving as therelational database management system (RDBMS).RDBMSis chosen for its widespread use, robustness, and ability to manage structured data efficiently through interconnected tables withparent-child relationships.
- The database is hosted on an
-
Frontend Development:
- The dynamic
front-end(user interface) is developed usingPHP 7.2, a server-side scripting language, along with standard web technologies:HTML5(for structure),CSS(for styling), andJavaScript(for interactivity). - Responsive Design:
Bootstrap3andjQuerylibraries are utilized to create a responsive and mobile-firstfront-end, ensuring compatibility with devices of different screen sizes and providing a consistent user experience. - Visualizations:
JpGraph library(https://jpgraph.net/) is used for plotting various charts, such as the distribution offoldamer types.Jmol(http://www.jmol.org/) is employed to render interactive 3D models of thefoldamersdirectly within the web interface, allowing users to visualize molecular structures.
- The dynamic
4.2.3. Substructure Search
A key feature of FoldamerDB is its substructure search capability, enabling users to find foldamers containing a specific chemical moiety.
-
Query Input:
JSME(a free molecule editor written inJavaScript(54)) is integrated into the web interface, allowing users to draw a query chemical structure directly in their browser. Alternatively, users can pasteSMILESstrings for their query molecule.
-
Fingerprint Generation:
- For all existing
FoldamerDBentries,FP2 fingerprintsare pre-calculated and stored in the database. - For the user's query structure, the
FP2 fingerprintis calculated on-the-fly using theOpen Babel Package, version 2.4.1(http://openbabel.sourceforge.net/).Open Babelis achemoinformatics toolkitfor converting, analyzing, and storing chemical data.
- For all existing
-
Similarity Assessment:
- The similarity between the query molecule's
fingerprintand thefingerprintsof all database entries is assessed using theTanimoto coefficient. This coefficient quantifies the degree of overlap between thefingerprintbit strings. - The
Tanimoto coefficientis calculated as follows: $ J ( A , B ) = { \frac { | A \cap B | } { | A \cup B | } } $ where:J(A, B)represents theTanimoto coefficientbetween twofingerprintsets, and .- is the set of
fingerprintsfor the query molecule. - is the set of
fingerprintsfor the hit molecule (an entry inFoldamerDB). - denotes the size of the intersection of
fingerprintsets and , which means the count of features (bits) that are present in bothfingerprints. - denotes the size of the union of
fingerprintsets and , which means the count of features (bits) that are present in eitherfingerprintorfingerprint.
- The
Tanimoto coefficientranges from 0 to 1, where a value of 1 indicates maximum similarity (identicalfingerprints), and a value of 0 indicates no common features. Search results are displayed with their correspondingTanimoto distance, allowing users to prioritize more similar hits.
- The similarity between the query molecule's
4.2.4. Database Content
FoldamerDB provides comprehensive information for each of its 1319+ peptidic foldamer entries.
-
Core Information:
Chemical diagram(2D and 3D models).Chemical name,sequence,SMILES,InchiKey.Molecular weight,molecular formula.Source(publication details).
-
Identifiers and Structural Data:
Internal ID(FoldDB ID).External database IDs:Reaxys ID,NCBI accession number,CCDC number,PDB ID.Methods of structural analysis: Indicates whetherNMRorX-ray crystallographywas used, with links toCSDandPDBwhere available.NMR solvent(ifNMRdata is present).
-
Functional and Property Data:
Application.Biological activity.Type of foldamer(e.g.,β-peptide,peptoid).Calculated properties:LogP, number ofH-bond donors, number ofH-bond acceptors,rotatable bonds,polar surface area (PSA).
-
Bibliographic Information:
- References to the original research articles from which the data was collected.
4.2.5. User Interface Layout
The FoldamerDB web interface is designed for intuition and user-friendliness, providing multiple navigation and search options.
- Main Pages:
- Home: The landing page, offering a brief introduction to
FoldamerDBand statistical overview. - Search: Provides comprehensive search options (Figure 3E):
- Simple search: Allows
keyword-based searchacross fields likeFoldDB ID,Reaxys ID,application,article title,author name,journal,sequence(one- or three-letter codes),chemical name,molecular formula,solvent,type, andPDB ID. Supportslogical operators(+for AND,-for NOT, no operator for OR). - Complex search: (Implicitly covers more advanced filtering options described in 'Browse Foldamers' and 'Browse Activity').
- Substructure search: Enables users to draw a molecule or paste a
SMILESstring to findfoldamerscontaining the querysubstructure.
- Simple search: Allows
- Browse Foldamers: An interactive table listing all
foldamers, with filtering options bybackbone typeandpublication year(Figure 3B). Clicking on aFoldDB IDleads to theSingle foldamer viewpage. - Single foldamer view: Displays detailed information for a specific
foldamer(Figure 3A, 3F). This includes2D and interactive 3D models(rendered byJmol),identification details(chemical name, sequence,SMILES,InchiKey, molecular properties),external IDs,structural data,application,foldamer type,calculated properties,biological activity(with links to other activities in the same reference), andcitations. - Browse article: Lists all articles from which data has been included, showing article title, authors, journal, year, and the number of
foldamersfrom each article (Figure 3C). - Browse structure: Lists
foldamerswith experimentally determined crystal structures (X-ray crystallography) available inPDBorCSD. - Browse activity: Provides a list of all reported
biological activitiesfor thefoldamersin the database (Figure 3D). - Glossary: Contains structures and chemical names of common non-natural
amino acidsandfoldamerbuilding blocks. - Feedback: Provides contact details for feedback, error reporting, and templates for contributing new data, which is reviewed by the
FoldamerDBteam.
- Home: The landing page, offering a brief introduction to
4.2.6. Analysis of Foldamer Types
The paper also presents an analysis of the foldamer types included in the database.

该图像是图表,展示了FoldamerDB中不同类型的肽骨架(A部分)和这些肽骨架的分布饼图(B部分)。A部分列出了α肽、β肽、γ肽、肽链及Aib酸的化学结构,B部分则显示各类型肽的数量分布,包括α/β肽、β肽和Aib骨架等。
Figure 1. (A) Different types of peptide backbones in FoldamerDB. (B) Pie chart of distribution of peptide backbone types in FoldamerDB.
As shown in Figure 1B, the distribution of peptide backbone types in FoldamerDB is as follows:
-
α/β-peptides: The most common type, with 383 entries. -
β-peptides: Second most common, with 312 entries, encompassingβ2-,β3-types, and . -
Aib foldamers: 181 entries, containing residues, categorized separately due toAibbeing a non-proteogenic . -
α-peptides: 156 entries, consisting of only natural , often included as starting points for modifications. -
Peptoids: 78 entries, characterized by side chains attached to theN-atomof the backbone. -
γ-peptide: 31 entries. -
α/γ-peptide: 20 entries. -
β/γ-peptide: 22 entries. -
α/β/γ-peptide: 23 entries. -
Others: 113 entries, including specific rare types like two entries.Each
foldamerentry is also assigned asubtypeif possible, based on the specific chemical structure of its building blocks as detailed in the original research articles.
5. Experimental Setup
This section describes the content and features of FoldamerDB as the "experimental setup," given that the paper introduces a database rather than a traditional experimental methodology. The focus is on what the database contains and how its functionalities are demonstrated.
5.1. Datasets
The "dataset" for this paper is the FoldamerDB itself.
-
Source: All
foldamerentries aremanually curatedandannotatedfrom published scientific literature (over 160 research papers). -
Scale and Characteristics:
- Total entries: Over 1319
peptidic foldamerspecies. - Activities: 1018 reported
biological activities. - Structural Data: 166 entries are reported with experimental crystal structures.
- Diversity: The database covers a wide range of
foldamer types, with the most common beingα/β-peptides(383 entries),β-peptides(312 entries), andAib foldamers(181 entries), as detailed in Figure 1B. - Content per entry: Each entry provides comprehensive information including:
- 2D and 3D models
Molecular properties(e.g.,LogP, number ofH-bond donorsandacceptors,rotatable bonds,polar surface area (PSA))- Compound identifiers (
SMILES,InchiKey) - Structural information (method of analysis like
NMRorX-ray crystallography,CCDC number,PDB ID) - External database IDs (
Reaxys ID,NCBI accession number) Biological activitiesApplicationType of foldamerBibliography
- Total entries: Over 1319
-
Data Sample Example: The paper provides examples of
peptide backbones(Figure 1A) and asubstructure searchquery (Figure 4A). For instance, anα-peptideentry would represent a standardalpha-amino acidchain, while aβ-peptideentry would featurebeta-amino acids. A specific entry might have aSMILESstring like the one shown in thesubstructure searchexample, representing its chemical structure, along with its reportedantimicrobial activityand details about its helical conformation determined byX-ray crystallography. -
Choice of Dataset: These entries were chosen because they meet the criteria of being described as
foldamersor experimentally confirmed to fold, directly addressing the goal of creating a dedicatedfoldamerresource. The diversity offoldamer typesensures the database is broadly applicable to the field.
5.2. Evaluation Metrics
The paper describes a database and its functionalities rather than presenting experimental results that would typically require formal evaluation metrics for performance. However, the internal mechanisms and design choices can be "evaluated" based on their utility and adherence to chemoinformatics best practices.
-
Tanimoto Coefficient: This is the primary metric used within the
substructure searchfunctionality ofFoldamerDB.- Conceptual Definition: The
Tanimoto coefficientquantifies the similarity between twochemical fingerprints, which are binary representations of molecular structures. It measures the degree of overlap between the structural features present in a query molecule and a molecule in the database. A higherTanimoto coefficientindicates greater structural similarity. - Mathematical Formula: $ J ( A , B ) = { \frac { | A \cap B | } { | A \cup B | } } $
- Symbol Explanation:
J(A, B): TheTanimoto coefficient(also known as the Jaccard index) between twofingerprintsets and .- : The set of
fingerprintbits representing the query molecule. - : The set of
fingerprintbits representing a hit molecule from the database. - : The cardinality (number of elements) of the intersection of sets and , which corresponds to the number of structural features common to both the query and the hit molecule.
- : The cardinality of the union of sets and , which corresponds to the total number of unique structural features present in either the query or the hit molecule.
- Conceptual Definition: The
-
Qualitative Metrics (implied for database design): While not explicitly stated as metrics, the paper emphasizes several qualitative aspects of the database's design and utility:
User-friendliness: Assessed by the intuitive navigation, clutter-free interface, and detailed help page.Responsiveness: Compatibility with different screen sizes (mobile-first design).Comprehensiveness: The number of entries, activities, and detailed information provided per entry.Manual curationandannotation: Ensuring data quality and relevance.Open-sourceandfreely available: Accessibility to the scientific community.
5.3. Baselines
Since FoldamerDB is presented as the first open-source, comprehensive database specifically for foldamers, there are no direct competing foldamer databases mentioned as baselines. The implicit baseline is the previous state of scattered information across scientific literature and general chemical/biological databases.
The paper argues that existing resources like PubChem, ChEMBL, CSD, and PDB, while valuable, do not offer the specialized focus, foldamer-specific curation, and integrated information that FoldamerDB provides. Thus, the database is designed to fill a unique niche rather than outperform existing, directly comparable tools. The need for such a database is justified by the success of specialized databases in related fields, such as CAMP and CAMPR3 for antimicrobial peptides, which serve as conceptual "baselines" demonstrating the utility of focused resources.
6. Results & Analysis
6.1. Core Results Analysis
The primary "result" of this paper is the successful development and deployment of FoldamerDB, a novel web-based resource that addresses a significant gap in foldamer research. The database centralizes and curates a vast amount of foldamer-specific information, making it readily accessible to the scientific community.
The key aspects demonstrating the effectiveness and utility of FoldamerDB are:
-
Comprehensive Content: The database successfully compiled information on over 1319
peptidic foldamerspecies and 1018biological activitiesfrom more than 160 research papers. This extensive collection represents a substantial effort inmanual curationand data integration, providing a rich, unified resource that was previously unavailable. Of particular note, 166 entries include experimental crystal structures, which are crucial for understanding the precise 3D conformations offoldamers. -
Detailed Information Per Entry: Each
foldamerentry inFoldamerDBis richly annotated, providing 2D and interactive 3D models,molecular properties(LogP,H-bond donors/acceptors,rotatable bonds,PSA), chemical identifiers (SMILES,InchiKey), structural determination methods (NMR,X-ray crystallographywith links toCSDandPDB), external database IDs,applications,biological activities, andbibliographic references. This level of detail empowers researchers to gain a holistic understanding of eachfoldamer. -
Intuitive and Functional User Interface: The web interface is designed to be
clutter-free,user-friendly, andresponsive, adapting to various screen sizes. This is crucial for broad accessibility and usability.- Search Capabilities: The
search page(Figure 3E) offers flexiblesimple searchoptions (byID,application,author,journal, etc.) and a powerfulsubstructure searchfeature (Figure 4). Thesubstructure searchallows users to draw a molecule usingJSMEand find similarfoldamersbased onTanimoto coefficients, directly supporting computational design efforts. - Browsing Options: Users can easily
browse foldamers(Figure 3B),articles(Figure 3C),structures(those with crystal structures), andactivities(Figure 3D), providing multiple entry points to explore the data. - Single Foldamer View: The
single foldamer view page(Figure 3A, 3F) is particularly effective, presenting all gathered information in a structured manner, including interactive 3D models, enabling in-depth analysis of individual compounds.
- Search Capabilities: The
-
Addressing a Critical Gap: The very existence of
FoldamerDBsuccessfully fills the identified void in specializedfoldamerdatabases, positioning it as a foundational resource for the burgeoningfoldamerfield. Itsopen-sourcenature further ensures wide adoption and utility. -
Enabling Future Research: By centralizing and structuring
foldamerdata,FoldamerDBlays the groundwork for advanced computational studies, particularly inmachine learningforfoldamer designandproperty prediction.The
analysis of foldamer types(Figure 1B) within the database also reveals the current landscape of research in the field, showing the prevalence ofα/β-peptidesandβ-peptides. This provides valuable meta-information for researchers on active areas offoldamersynthesis and study.
6.2. Data Presentation (Tables)
The paper does not present explicit results in the form of comparative tables against baselines in the traditional sense, as it introduces a new database. However, it implicitly presents "results" through the description of its content and features.
The following illustrates an example of a substructure search and its output as described in the paper, which serves as a demonstration of the database's functionality rather than a quantitative result table:
Example of Substructure Search Output
The paper provides an example of a substructure search using a query molecule (Figure 4A). The output of such a search would typically list matching foldamers and their Tanimoto coefficients, indicating similarity to the query.

该图像是一个界面展示,左侧为分子结构绘制工具,右侧展示与所绘分子匹配的折叠肽的信息,包括其FoldDB ID和相似度。用户可以通过该工具搜索不同的肽。
Figure 4. between the query and hit molecules is measured in terms of Tanimoto index, which ranges from 0 to 1.
The SMILES string for the example query molecule (Figure 4A) is:
When this query is submitted, FoldamerDB identifies entries that share structural similarity. Figure 4B illustrates a partial output of such a search, showing the FoldDB ID of the hit molecules and their respective Tanimoto coefficients (ranging from 0 to 1). A higher Tanimoto coefficient indicates greater similarity between the hit molecule and the query molecule. This allows users to quickly identify foldamers that are structurally related to their molecule of interest.
The following are the results from Figure 4B of the original paper, illustrating the output format for a substructure search:
| FoldDB ID | Chemical Name | Source | Tanimoto Distance |
|---|---|---|---|
| 170 | Aib foldamer | PMID:26613945 | 0.44 |
| 594 | β-peptide | PMID:26613945 | 0.43 |
| 792 | β-peptide | PMID:26613945 | 0.43 |
| 1201 | β-peptide | PMID:26613945 | 0.43 |
| 566 | α/β-peptide | PMID:26613945 | 0.42 |
This table demonstrates the search's ability to retrieve foldamers with varying degrees of similarity to the query and links them back to their source publications.
6.3. Ablation Studies / Parameter Analysis
The paper, being a description of a newly developed database and its features, does not include ablation studies or parameter analysis in the conventional sense (i.e., breaking down a model or tuning its hyperparameters). Such analyses are typical for algorithmic or model-centric research papers. Instead, the "analysis" in this context refers to the categorization and statistical overview of the data contained within FoldamerDB and the demonstration of its functional capabilities.
The primary "analysis" presented is the distribution of foldamer types (Figure 1B), which provides insights into the database's content and the general landscape of foldamer research at the time of publication. This is an overview of the collected data rather than an experimental evaluation of the database's underlying components.
7. Conclusion & Reflections
7.1. Conclusion Summary
The paper successfully introduces FoldamerDB, an innovative and much-needed open-source database dedicated to peptidic foldamers. It effectively bridges a significant gap in publicly available resources by centralizing, meticulously curating, and annotating comprehensive information on foldamer sequences, structures, and biological activities. With over 1319 foldamer entries and 1018 associated activities gathered from more than 160 research articles, FoldamerDB provides a rich dataset for the scientific community. The database's user-friendly, responsive web interface, equipped with robust search (including substructure search via Tanimoto coefficient) and browsing functionalities, ensures easy access and utility. By offering detailed molecular properties, 2D and 3D models, and links to external structural databases, FoldamerDB is poised to be a foundational tool for diverse scientific groups, from synthetic chemists to biologists, facilitating novel design projects, particularly those employing machine learning techniques in the rapidly evolving field of foldamer research.
7.2. Limitations & Future Work
The authors acknowledge that FoldamerDB is a "first milestone" and has potential for further expansion. The primary limitation mentioned is that the current version focuses predominantly on peptidic compounds, with a majority of entries being —the largest subtype produced until then.
Based on this, the authors outline clear directions for future work:
- Expansion to Exotic Foldamer Types: The main goal for the future is to expand
FoldamerDBto include information about a broader range offoldamer typesbeyondpeptidic foldamers, specifically mentioningaromatic oligoamidesas an example. This would make the database even more comprehensive for the entirefoldamerfield. - Community Contribution: To achieve this expansion and ensure the database remains up-to-date, the authors actively encourage the scientific community to contribute new data to the
FoldamerDBproject. Afeedback pageand direct contact options are provided for this purpose, with a review process in place to maintain data quality.
7.3. Personal Insights & Critique
FoldamerDB represents a crucial development for the foldamer research community. My personal insights and critique are as follows:
- Importance for Emerging Fields: The paper highlights the critical need for specialized databases as scientific fields mature.
Foldamerchemistry, though relatively young, has generated substantial data. Without such a centralized resource, valuable information remains siloed in publications, hindering systematic analysis andcomputational design.FoldamerDBsets an excellent example of how to consolidate knowledge in a niche but growing area. - Value of Manual Curation: The emphasis on
manual curationis a strong point. While automated data extraction can offer scale,manual annotationby experts ensures the quality, accuracy, andfoldamer-specific context of the entries, which is vital for building trust and reliability in the data, especially for a relatively complex and diverse class of molecules. - Enabling Computational Design: The integration of features like
substructure searchwithTanimoto coefficientand the structured availability ofmolecular propertiesdirectly supports the application ofchemoinformaticsandmachine learningforde novo foldamer designandproperty prediction. This positionsFoldamerDBas more than just a repository; it's a computational enabler. This is particularly valuable asmachine learningmodels require large, well-structured datasets for training. - Sustainability and Community Engagement: The call for community contributions is a pragmatic approach to ensure the long-term sustainability and growth of the database. However,
manual curationof user-submitted data can be resource-intensive. The paper doesn't detail the mechanisms or capacity for reviewing potentially large volumes of user contributions, which could become a bottleneck. - Potential for Integration: While
FoldamerDBprovides links toCSDandPDB, future enhancements could explore deeper integration with other relevant resources (e.g., more direct links tobioactivity assaysinChEMBLorPubChemwhere applicable) or evenvirtual screeningtools. - Unverified Assumptions/Areas for Improvement:
-
Data Completeness: While impressive, the database might not be exhaustive of all published
foldamers. Ongoingmanual curationis a continuous challenge. -
Scope Definition: The "peptidic" focus, while justified by prevalence, limits the database's reach for other exciting
foldamer typeslikearomatic oligoamides. The stated future work addresses this, but the initial scope could be seen as a limitation for researchers working outside the peptidic domain. -
User Analytics: The paper doesn't discuss any user analytics or feedback loop for improving the database interface or content prioritization based on actual user behavior. Understanding what
foldamer typesare most searched or which features are most used could guide future development. -
Definition of "Foldamer": The inclusion criterion of "described as 'foldamer' or shown experimentally to fold into a specific 3D structure" is good. However, the definition of "well-defined 3D conformation" can sometimes be subjective or context-dependent in the literature, which
manual curationhelps mitigate but doesn't entirely remove ambiguity.Overall,
FoldamerDBis a highly valuable contribution that will undoubtedly accelerate research infoldamerchemistry and biology, providing a foundation for future discoveries and applications.
-
Similar papers
Recommended via semantic vector search.