AiPaper
Paper status: completed

FoldamerDB: a database of peptidic foldamers

Published:10/17/2019
Original Link
Price: 0.10
4 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

FoldamerDB is an open-source, fully annotated database of peptidic foldamers, containing information on 1319 species and their biological activities, collected from over 160 papers. The user-friendly interface allows for comprehensive searching and filtering, addressing a gap in

Abstract

Foldamers are non-natural oligomers that mimic the structural behaviour of natural peptides, proteins and nucleotides by folding into a well-defined 3D conformation in solution. Since their first description about two decades ago, numerous studies have been undertaken dealing with the design, synthesis, characterization and application of foldamers. They have huge application potential as antimicrobial, anticancer and anti-HIV agents and in materials science. Despite their importance, there is no publicly available web resource providing comprehensive information on these compounds. Here we describe FoldamerDB, an open-source, fully annotated and manually curated database of peptidic foldamers. FoldamerDB holds the information about the sequence, structure and biological activities of the foldamer entries. It contains the information on over 1319 species and 1018 activities, collected from more than 160 research papers. The web-interface is designed to be clutter-free, user-friendly and it is compatible with devices of different screen sizes. The interface allows the user to search the database, browse and filter the foldamers using multiple criteria. It also offers a detailed help page to assist new users. FoldamerDB is hoped to bridge the gap in the freely available web-based resources on foldamers and will be of interest to diverse groups of scientists from chemists to biologists. The database can be accessed at http://foldamerdb.ttk.hu/.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The central topic of the paper is "FoldamerDB: a database of peptidic foldamers."

1.2. Authors

The authors are Bilal Nizami, Dorottya Bereczki-Szakál, Nikolett Varró, Kamal el Battioui, Vignesh U. Nagaraj, Imola Cs. Szigyártó, István Mándity, and Tamás Beke-Somfai. Their affiliations include the MTA TTK Lendület Biomolecular Self-Assembly Research Group, Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Hungarian Academy of Sciences, located in Budapest, Hungary.

1.3. Journal/Conference

The paper was published in Nucleic Acids Research (NAR). NAR is a highly reputable peer-reviewed scientific journal in the fields of molecular biology, biochemistry, and bioinformatics. It is particularly well-known for its annual database issue, which features comprehensive descriptions of biological databases, making it an influential venue for this type of research.

1.4. Publication Year

The paper was published on 2019-10-17T00:00:00.000Z.

1.5. Abstract

The paper introduces FoldamerDB, an open-source, fully annotated, and manually curated database dedicated to peptidic foldamers. Foldamers are synthetic oligomers designed to mimic the structural properties of natural biopolymers by folding into well-defined 3D conformations. Despite their growing importance and potential in various applications (e.g., antimicrobial, anticancer agents, materials science), the authors highlight a significant gap: the absence of a publicly available, comprehensive web resource for these compounds.

FoldamerDB addresses this gap by providing detailed information on the sequence, structure, and biological activities of foldamer entries. It currently contains information on over 1319 species and 1018 activities, meticulously gathered from more than 160 research papers. The database features a user-friendly, clutter-free, and responsive web interface that allows users to search, browse, and filter foldamers using multiple criteria. A detailed help page is also available to assist new users. The authors anticipate that FoldamerDB will be a valuable resource for a diverse scientific community, ranging from chemists to biologists, facilitating research and design in the field of foldamers.

The original source link is /files/papers/69120bd4b150195a0db74a26/paper.pdf. It is an officially published paper.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the lack of a centralized, comprehensive, and publicly accessible database for foldamers, specifically peptidic foldamers. Foldamers are an important class of non-natural oligomers that can fold into defined 3D structures, mimicking natural peptides, proteins, and nucleotides. They have demonstrated significant potential in diverse applications, including antimicrobial, anticancer, and anti-HIV agents, as well as in materials science.

Despite their importance and widespread research, information about foldamers—their sequences, structures, and biological activities—is scattered across numerous scientific publications. This fragmented knowledge base makes it challenging for researchers, particularly those involved in computer-aided design, machine learning, molecular graphics, protein structure prediction, and drug design, to efficiently access and utilize this data. The growing sophistication of computational methods in these fields necessitates a dedicated and focused data resource.

The paper's entry point is to create FoldamerDB, the first open-source and comprehensive database for peptidic foldamers, thereby centralizing this critical information and facilitating advanced research and design.

2.2. Main Contributions / Findings

The primary contributions and findings of the paper are:

  • Establishment of FoldamerDB: The creation and public release of FoldamerDB, the first open-source, fully annotated, and manually curated database specifically for peptidic foldamers. This fills a critical gap in publicly available web resources for this important class of synthetic macromolecules.
  • Comprehensive Data Collection: The database currently hosts information on over 1319 peptidic foldamer species and 1018 associated biological activities, meticulously gathered from more than 160 research papers. This extensive collection provides a rich data source for the scientific community.
  • Detailed Foldamer Information: Each entry in FoldamerDB provides comprehensive details, including 2D and 3D models, molecular properties (e.g., LogP, H-bond donors/acceptors, rotatable bonds, polar surface area), compound identifiers (e.g., SMILES, InchiKey), structural information (e.g., method of analysis, links to CSD and PDB), external database IDs (Reaxys ID, NCBI accession number), biological activities, and bibliographic references.
  • User-Friendly Web Interface: The database features an intuitive, clutter-free, and responsive web interface compatible with various screen sizes. It offers multiple search options (simple, complex, substructure search using Tanimoto coefficient for similarity), browsing capabilities (by foldamer type, article, structure, or activity), and a detailed single foldamer view.
  • Community Engagement: FoldamerDB includes features for user feedback and encourages contributions of new data from the scientific community, aiming for continuous expansion and maintenance.
  • Enabling Future Research: The database is expected to serve as a foundational tool for novel design projects, particularly those involving machine learning techniques, by providing easily accessible structural, chemical, and biological information on foldamers.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand the FoldamerDB paper, it's helpful to be familiar with the following concepts:

  • Foldamers: These are non-natural oligomers (molecules made of repeating units) that are designed to mimic the structural behavior of natural biopolymers like peptides, proteins, and nucleotides. Unlike their natural counterparts, foldamers are typically synthesized from non-natural building blocks. Their defining characteristic is their ability to fold into a well-defined, stable 3D conformation in solution, which is crucial for their function. This folding behavior is often stabilized by hydrogen bonds or other intramolecular interactions.

  • Peptidic Foldamers: A specific and important class of foldamers where the building blocks are amino acid analogs or peptidomimetics. They are designed to mimic peptide structures. Examples mentioned in the paper include:

    • α-peptide: Composed of natural alpha-amino acids.
    • β-peptide: Composed of beta-amino acids, which have an extra carbon atom in the backbone compared to alpha-amino acids. They can form stable secondary structures like helices.
    • γ-peptide: Composed of gamma-amino acids, having two extra carbon atoms in the backbone.
    • α/β-peptide, α/γ-peptide, β/γ-peptide, α/β/γ-peptide: Hybrid foldamers incorporating a mix of different amino acid types in their backbone.
    • Peptoids: N-substituted glycines where the side chain is attached to the backbone nitrogen atom rather than the alpha-carbon. This modification often increases protease resistance.
    • Aib foldamer: Contains αaminobutyricacid(Aib)α-aminobutyric acid (Aib) residues, which are non-proteogenic alpha-amino acids known for inducing helical conformations due to their steric bulk.
  • Relational Database Management System (RDBMS): A software system used to create and manage relational databases. In an RDBMS, data is organized into tables (relations), which are linked to each other by common fields (keys), establishing parent-child relationships. This structure allows for efficient storage, retrieval, and management of large amounts of structured data. MySQL is a popular open-source RDBMS.

  • Molecular Properties:

    • LogP: A measure of a compound's lipophilicity (fat-solubility) or hydrophobicity (water-repelling nature). It is the logarithm of the partition coefficient (P) of a compound between two immiscible solvents, typically octanol and water. Higher LogP values indicate greater lipophilicity.
    • H-bond donors and acceptors: Refers to the number of atoms in a molecule that can form hydrogen bonds. Hydrogen bond donors typically contain hydrogen atoms bonded to highly electronegative atoms (like oxygen or nitrogen), while hydrogen bond acceptors typically contain highly electronegative atoms with lone pairs of electrons. These are important for molecular interactions and solubility.
    • Rotatable bonds: Covalent bonds that allow free rotation of the groups attached to them. The number of rotatable bonds is an indicator of a molecule's flexibility, which can influence its binding affinity to targets and its ability to adopt different conformations.
    • Polar Surface Area (PSA): The sum of the surface areas of all polar atoms (typically oxygen, nitrogen, and attached hydrogen atoms) in a molecule. It's often used to predict drug absorption, blood-brain barrier penetration, and cell permeability, as highly polar molecules tend to have poorer membrane penetration.
  • Chemical Fingerprints (FP2): A computational representation of a molecule's structure using a fixed-length binary string (a series of 0s and 1s). Each bit in the fingerprint corresponds to the presence or absence of a specific structural feature or substructure within the molecule. FP2 refers to a specific type of fingerprint often used in chemoinformatics for similarity searching and substructure searching.

  • Tanimoto Coefficient (or Jaccard Index): A commonly used metric to quantify the similarity between two sets, or in chemoinformatics, between two chemical fingerprints. It is calculated as the ratio of the number of common features (intersection) to the total number of features (union) present in both molecules' fingerprints. A value of 1 indicates identical molecules, while 0 indicates no common features. $ J ( A , B ) = { \frac { | A \cap B | } { | A \cup B | } } $ where:

    • J(A, B) is the Tanimoto coefficient between fingerprint sets AA and BB.
    • AB|A \cap B| is the number of bits set to 1 in both fingerprints AA and BB (the intersection).
    • AB|A \cup B| is the number of bits set to 1 in either fingerprint AA or BB (the union).
    • AA and BB are the fingerprint sets of the query and hit molecules, respectively.
  • Web Technologies:

    • Apache HTTP server: A popular open-source web server software.
    • MySQL: An open-source relational database management system.
    • PHP: A server-side scripting language designed for web development.
    • HTML5, CSS, JavaScript: Core technologies for building web pages (structure, styling, interactivity).
    • Bootstrap3: A popular front-end framework for developing responsive and mobile-first websites.
    • jQuery: A fast, small, and feature-rich JavaScript library that simplifies HTML document traversal and manipulation, event handling, animation, and Ajax.
    • Jmol: An open-source Java viewer for chemical structures in 3D.

3.2. Previous Works

The paper explicitly states that, despite the importance and high potential of foldamers, "there is no publicly available web resource providing comprehensive information on these compounds." This highlights the primary gap that FoldamerDB aims to fill.

However, the authors mention several existing databases that were used as sources for data collection and cross-referencing, indicating their general relevance to chemical and biological information:

  • Reaxys: A chemical database system by Elsevier, providing extensive information on chemical reactions, compounds, and bibliographic data. It was used to extract chemical and other information and for cross-referencing.

  • PubChem (51): A public repository for chemical information, including structures, identifiers, properties, and biological activities of small molecules. Used for extracting chemical and other information.

  • ChEMBL (52): A large-scale bioactivity database curated by the European Bioinformatics Institute (EBI), containing information on compounds with drug-like properties and their biological activities. Used for extracting chemical and other information.

  • NCBI databases: A suite of databases from the National Center for Biotechnology Information, including sequence databases, protein databases, and literature databases like PubMed and PMC. Used for extracting chemical and other information.

  • CSD (Cambridge Structural Database): A comprehensive collection of experimentally determined small-molecule organic and metal-organic crystal structures. Used to extract experimental structural information.

  • PDB (Protein Data Bank): A worldwide repository for the 3D structural data of large biological molecules, such as proteins and nucleic acids, determined by experimental methods. Used to extract experimental structural information.

    The paper also cites examples of specialized databases for antimicrobial peptides (AMPs) to illustrate the growing need for focused resources in related fields:

  • CAMP (Collection of sequences and structures of antimicrobial peptides) (45): An early database for antimicrobial peptides.

  • CAMPR3 (46): A later version or related database focusing on antimicrobial peptides sequences, structures, and signatures.

    These examples reinforce the argument that specialized databases are crucial for advancing research, especially when sophisticated algorithms and machine learning are applied to specific types of biomolecules. The absence of such a resource for foldamers was a significant impediment that FoldamerDB addresses.

3.3. Technological Evolution

The field of foldamer chemistry, pioneered roughly two decades prior to this paper's publication by groups like Gellman and Seebach, has seen continuous growth in the design, synthesis, characterization, and application of these unique oligomers. This evolution has led to a vast body of literature describing numerous foldamer species, their diverse structures (e.g., various helices, sheets, turns, and higher-ordered assemblies), and their potential biological and materials science applications.

Concurrently, the broader field of chemoinformatics and bioinformatics has advanced significantly, with increasing reliance on computational tools for drug discovery, protein structure prediction, and materials design. This includes the development of sophisticated algorithms for molecular graphics, similarity searching, and machine learning.

The FoldamerDB paper fits within this technological timeline as a response to the maturation of foldamer chemistry and the increasing computational demands of chemical research. As the volume and complexity of foldamer data grew, the need for a structured, searchable, and easily accessible repository became critical. FoldamerDB represents a crucial step in the evolution of foldamer research, moving from purely experimental, paper-based data dissemination to a centralized digital resource that can enable higher-throughput computational analysis and design. It leverages modern web development technologies (PHP, MySQL, HTML5, CSS, JavaScript, Bootstrap) and chemoinformatics tools (JSME, Open Babel, Tanimoto coefficient) to achieve its goals.

3.4. Differentiation Analysis

Compared to the main methods and resources available in related work (primarily general chemical databases and specialized peptide databases), FoldamerDB offers several core differences and innovations:

  • Specialized Focus: The most significant differentiation is its exclusive and dedicated focus on foldamers, particularly peptidic foldamers. While general chemical databases like PubChem or Reaxys might contain some foldamer entries, they do not provide the specialized curation, classification, and structural context specific to foldamers (e.g., specific foldamer types, detailed structural analysis methods like NMR or X-ray crystallography for folding confirmation).

  • Comprehensive Curation for Foldamers: FoldamerDB goes beyond simply listing compounds. Each entry is manually curated and annotated specifically for foldamer characteristics. This includes confirming if a compound is described as a foldamer or experimentally shown to fold into a specific 3D structure, which is a key criterion for inclusion. This level of foldamer-specific expert annotation is not available in broader databases.

  • Integration of Structural and Biological Data: It integrates sequence, 2D/3D structure, molecular properties, and biological activity data in one place, specifically tailored for foldamers. Links to external structural databases like CSD and PDB are provided for deeper structural insights, a feature crucial for understanding foldamer behavior.

  • Classification by Backbone Type: FoldamerDB classifies peptidic foldamers into distinct categories based on their backbone types (α-peptide, β-peptide, γ-peptide, peptoids, Aib foldamers, and various hybrids), which is essential for foldamer chemists and provides a structured way to navigate this diverse class of molecules (as shown in Figure 1).

  • User-Oriented Search and Browsing: The web interface is designed with foldamer research in mind, offering specialized search options like substructure search (using JSME and Tanimoto coefficient) and filtering by foldamer type, publication year, and biological activity. This makes it much more efficient for foldamer researchers to find relevant information compared to generic chemical search engines.

  • Open-Source and Freely Available: It is explicitly highlighted as an "open-source" and "freely available" resource, removing barriers to access for the global scientific community.

  • Enabling Computational Design: By providing structured and curated data, FoldamerDB directly supports the application of machine learning and other computational design techniques for foldamers, which is a growing area of research that cannot be effectively supported by unstructured literature or general databases.

    In essence, while general chemical databases provide breadth, FoldamerDB provides the necessary depth, specificity, and foldamer-centric organization that was previously missing.

4. Methodology

4.1. Principles

The core principle behind FoldamerDB is to centralize and standardize the dispersed information regarding peptidic foldamers into a single, comprehensive, user-friendly, and open-source web-based database. This aims to bridge the gap in available resources for foldamer research, providing a foundation for design, synthesis, characterization, and application studies, especially those leveraging computational methods like machine learning. The database is built on the premise of manual curation and annotation to ensure data quality and relevance to the foldamer concept (i.e., compounds explicitly described as foldamers or experimentally confirmed to fold into a specific 3D structure).

4.2. Core Methodology In-depth (Layer by Layer)

The methodology of FoldamerDB involves several key stages, from data collection and processing to database design, implementation, and user interface development.

4.2.1. Data Collection and Processing

The process of populating FoldamerDB begins with identifying relevant foldamer compounds from scientific literature and then meticulously extracting and annotating their associated data. The workflow is schematically represented in Figure 2.

Figure 2. Workflow of data collection and processing as well as information flow in FoldamerDB.
该图像是FoldamerDB的数据收集与处理工作流程示意图,展示了数据如何通过不同的数据源(如Reaxys、CSD、PDB和PubChem)进行整合,最终存储于FoldamerDB中,并通过Apache HTTP服务器提供网页接口。在数据库中,信息被分类为序列和活性。

Figure 2. Workflow of data collection and processing as well as information flow in FoldamerDB.

  1. Literature Search:

    • Keywords: Keywords such as 'foldamer', 'non-natural peptide', 'peptide', and 'folding' are used to perform comprehensive searches in scientific databases like SCOPUS, PubMed, and PMC.
    • Inclusion Criteria: A compound is included in FoldamerDB only if it is explicitly described as a foldamer in the literature or if experimental evidence demonstrates its ability to fold into a specific 3D structure. This ensures the relevance of the entries to the core definition of a foldamer.
  2. Data Extraction and Cross-referencing:

    • For each identified foldamer, detailed chemical and other relevant information is extracted.
    • External Databases: The literature entries are cross-referenced with various established external databases to gather additional data and validate existing information. These include:
      • Reaxys (https://www.reaxys.com/): For general chemical information.
      • PubChem (51): A public repository for chemical structures and biological activities.
      • ChEMBL (52): A bioactivity database for compounds with drug-like properties.
      • NCBI databases: For biological sequences and related information.
      • CSD (Cambridge Structural Database): For experimental crystal structural information.
      • PDB (Protein Data Bank): For 3D structural data of larger biomolecules, including some foldamers.
    • Information Categories: The extracted data covers various aspects, including 2D and 3D models, molecular properties (e.g., LogP, number of H-bond donors and acceptors, rotatable bonds, polar surface area), compound identifiers (e.g., SMILES, InchiKey), structural information (e.g., method of analysis like NMR or X-ray crystallography), external database IDs (Reaxys ID, NCBI accession number, CCDC number, PDB ID), biological activities, applications, type of foldamer, and bibliography.
  3. Data Processing and Annotation:

    • Software Tools: Python3 (https://www.python.org/) and KNIME Analytics Platform version 3.6.2 (53) are utilized for processing and annotating the collected data. KNIME is an open-source data analytics, reporting, and integration platform.
    • Structure Correction: Marvin by ChemAxon (https://chemaxon.com/products/marvin) is employed to correct any erroneous chemical structures identified during the process, ensuring data accuracy.
    • Foldamer Classification: Each collected foldamer is classified according to its backbone type into one of the following predefined groups: α-peptide, β-peptide, γ-peptide, α/β-peptide, α/γ-peptide, α/β/γ-peptide, β/γ-peptide, Aib foldamer, or peptoids. This classification is crucial for organizing and searching the database, as illustrated in Figure 1A. While the database primarily focuses on peptidic foldamers, some natural α-peptides are also included, particularly if they served as starting sequences for modification with non-natural insertions, especially for machine learning purposes. All entries are generally referred to as foldamers for simplicity.
    • Subtype Assignment: Wherever feasible, each foldamer entry is also assigned a subtype based on the specific chemical structure of its building blocks, as described in the original research articles.

4.2.2. Database Design and Implementation

FoldamerDB is designed as a relational database to ensure efficient data retrieval and scalability, accommodating a growing number of entries.

  1. Backend Infrastructure:

    • The database is hosted on an Apache HTTP server 2.4, which serves the web pages and handles HTTP requests.
    • The data itself is stored in a MySQL server 5.7 instance, serving as the relational database management system (RDBMS). RDBMS is chosen for its widespread use, robustness, and ability to manage structured data efficiently through interconnected tables with parent-child relationships.
  2. Frontend Development:

    • The dynamic front-end (user interface) is developed using PHP 7.2, a server-side scripting language, along with standard web technologies: HTML5 (for structure), CSS (for styling), and JavaScript (for interactivity).
    • Responsive Design: Bootstrap3 and jQuery libraries are utilized to create a responsive and mobile-first front-end, ensuring compatibility with devices of different screen sizes and providing a consistent user experience.
    • Visualizations:
      • JpGraph library (https://jpgraph.net/) is used for plotting various charts, such as the distribution of foldamer types.
      • Jmol (http://www.jmol.org/) is employed to render interactive 3D models of the foldamers directly within the web interface, allowing users to visualize molecular structures.

A key feature of FoldamerDB is its substructure search capability, enabling users to find foldamers containing a specific chemical moiety.

  1. Query Input:

    • JSME (a free molecule editor written in JavaScript (54)) is integrated into the web interface, allowing users to draw a query chemical structure directly in their browser. Alternatively, users can paste SMILES strings for their query molecule.
  2. Fingerprint Generation:

    • For all existing FoldamerDB entries, FP2 fingerprints are pre-calculated and stored in the database.
    • For the user's query structure, the FP2 fingerprint is calculated on-the-fly using the Open Babel Package, version 2.4.1 (http://openbabel.sourceforge.net/). Open Babel is a chemoinformatics toolkit for converting, analyzing, and storing chemical data.
  3. Similarity Assessment:

    • The similarity between the query molecule's fingerprint and the fingerprints of all database entries is assessed using the Tanimoto coefficient. This coefficient quantifies the degree of overlap between the fingerprint bit strings.
    • The Tanimoto coefficient is calculated as follows: $ J ( A , B ) = { \frac { | A \cap B | } { | A \cup B | } } $ where:
      • J(A, B) represents the Tanimoto coefficient between two fingerprint sets, AA and BB.
      • AA is the set of fingerprints for the query molecule.
      • BB is the set of fingerprints for the hit molecule (an entry in FoldamerDB).
      • AB|A \cap B| denotes the size of the intersection of fingerprint sets AA and BB, which means the count of features (bits) that are present in both fingerprints.
      • AB|A \cup B| denotes the size of the union of fingerprint sets AA and BB, which means the count of features (bits) that are present in either fingerprint AA or fingerprint BB.
    • The Tanimoto coefficient ranges from 0 to 1, where a value of 1 indicates maximum similarity (identical fingerprints), and a value of 0 indicates no common features. Search results are displayed with their corresponding Tanimoto distance, allowing users to prioritize more similar hits.

4.2.4. Database Content

FoldamerDB provides comprehensive information for each of its 1319+ peptidic foldamer entries.

  1. Core Information:

    • Chemical diagram (2D and 3D models).
    • Chemical name, sequence, SMILES, InchiKey.
    • Molecular weight, molecular formula.
    • Source (publication details).
  2. Identifiers and Structural Data:

    • Internal ID (FoldDB ID).
    • External database IDs: Reaxys ID, NCBI accession number, CCDC number, PDB ID.
    • Methods of structural analysis: Indicates whether NMR or X-ray crystallography was used, with links to CSD and PDB where available.
    • NMR solvent (if NMR data is present).
  3. Functional and Property Data:

    • Application.
    • Biological activity.
    • Type of foldamer (e.g., β-peptide, peptoid).
    • Calculated properties: LogP, number of H-bond donors, number of H-bond acceptors, rotatable bonds, polar surface area (PSA).
  4. Bibliographic Information:

    • References to the original research articles from which the data was collected.

4.2.5. User Interface Layout

The FoldamerDB web interface is designed for intuition and user-friendliness, providing multiple navigation and search options.

  1. Main Pages:
    • Home: The landing page, offering a brief introduction to FoldamerDB and statistical overview.
    • Search: Provides comprehensive search options (Figure 3E):
      • Simple search: Allows keyword-based search across fields like FoldDB ID, Reaxys ID, application, article title, author name, journal, sequence (one- or three-letter codes), chemical name, molecular formula, solvent, type, and PDB ID. Supports logical operators (+ for AND, - for NOT, no operator for OR).
      • Complex search: (Implicitly covers more advanced filtering options described in 'Browse Foldamers' and 'Browse Activity').
      • Substructure search: Enables users to draw a molecule or paste a SMILES string to find foldamers containing the query substructure.
    • Browse Foldamers: An interactive table listing all foldamers, with filtering options by backbone type and publication year (Figure 3B). Clicking on a FoldDB ID leads to the Single foldamer view page.
    • Single foldamer view: Displays detailed information for a specific foldamer (Figure 3A, 3F). This includes 2D and interactive 3D models (rendered by Jmol), identification details (chemical name, sequence, SMILES, InchiKey, molecular properties), external IDs, structural data, application, foldamer type, calculated properties, biological activity (with links to other activities in the same reference), and citations.
    • Browse article: Lists all articles from which data has been included, showing article title, authors, journal, year, and the number of foldamers from each article (Figure 3C).
    • Browse structure: Lists foldamers with experimentally determined crystal structures (X-ray crystallography) available in PDB or CSD.
    • Browse activity: Provides a list of all reported biological activities for the foldamers in the database (Figure 3D).
    • Glossary: Contains structures and chemical names of common non-natural amino acids and foldamer building blocks.
    • Feedback: Provides contact details for feedback, error reporting, and templates for contributing new data, which is reviewed by the FoldamerDB team.

4.2.6. Analysis of Foldamer Types

The paper also presents an analysis of the foldamer types included in the database.

Figure 1. (A) Different types of peptide backbones in FoldamerDB. (B) Pie chart of distribution of peptide backbone types in FoldamerDB.
该图像是图表,展示了FoldamerDB中不同类型的肽骨架(A部分)和这些肽骨架的分布饼图(B部分)。A部分列出了α肽、β肽、γ肽、肽链及Aib酸的化学结构,B部分则显示各类型肽的数量分布,包括α/β肽、β肽和Aib骨架等。

Figure 1. (A) Different types of peptide backbones in FoldamerDB. (B) Pie chart of distribution of peptide backbone types in FoldamerDB.

As shown in Figure 1B, the distribution of peptide backbone types in FoldamerDB is as follows:

  • α/β-peptides: The most common type, with 383 entries.

  • β-peptides: Second most common, with 312 entries, encompassing β2-, β3-types, and cyclicβpeptidescyclic β-peptides.

  • Aib foldamers: 181 entries, containing αaminobutyricacid(Aib)α-aminobutyric acid (Aib) residues, categorized separately due to Aib being a non-proteogenic αaminoacidα-amino acid.

  • α-peptides: 156 entries, consisting of only natural αaminoacidsα-amino acids, often included as starting points for modifications.

  • Peptoids: 78 entries, characterized by side chains attached to the N-atom of the backbone.

  • γ-peptide: 31 entries.

  • α/γ-peptide: 20 entries.

  • β/γ-peptide: 22 entries.

  • α/β/γ-peptide: 23 entries.

  • Others: 113 entries, including specific rare types like two α/εhybridpeptideα/ε hybrid peptide entries.

    Each foldamer entry is also assigned a subtype if possible, based on the specific chemical structure of its building blocks as detailed in the original research articles.

5. Experimental Setup

This section describes the content and features of FoldamerDB as the "experimental setup," given that the paper introduces a database rather than a traditional experimental methodology. The focus is on what the database contains and how its functionalities are demonstrated.

5.1. Datasets

The "dataset" for this paper is the FoldamerDB itself.

  • Source: All foldamer entries are manually curated and annotated from published scientific literature (over 160 research papers).

  • Scale and Characteristics:

    • Total entries: Over 1319 peptidic foldamer species.
    • Activities: 1018 reported biological activities.
    • Structural Data: 166 entries are reported with experimental crystal structures.
    • Diversity: The database covers a wide range of foldamer types, with the most common being α/β-peptides (383 entries), β-peptides (312 entries), and Aib foldamers (181 entries), as detailed in Figure 1B.
    • Content per entry: Each entry provides comprehensive information including:
      • 2D and 3D models
      • Molecular properties (e.g., LogP, number of H-bond donors and acceptors, rotatable bonds, polar surface area (PSA))
      • Compound identifiers (SMILES, InchiKey)
      • Structural information (method of analysis like NMR or X-ray crystallography, CCDC number, PDB ID)
      • External database IDs (Reaxys ID, NCBI accession number)
      • Biological activities
      • Application
      • Type of foldamer
      • Bibliography
  • Data Sample Example: The paper provides examples of peptide backbones (Figure 1A) and a substructure search query (Figure 4A). For instance, an α-peptide entry would represent a standard alpha-amino acid chain, while a β-peptide entry would feature beta-amino acids. A specific entry might have a SMILES string like the one shown in the substructure search example, representing its chemical structure, along with its reported antimicrobial activity and details about its helical conformation determined by X-ray crystallography.

  • Choice of Dataset: These entries were chosen because they meet the criteria of being described as foldamers or experimentally confirmed to fold, directly addressing the goal of creating a dedicated foldamer resource. The diversity of foldamer types ensures the database is broadly applicable to the field.

5.2. Evaluation Metrics

The paper describes a database and its functionalities rather than presenting experimental results that would typically require formal evaluation metrics for performance. However, the internal mechanisms and design choices can be "evaluated" based on their utility and adherence to chemoinformatics best practices.

  • Tanimoto Coefficient: This is the primary metric used within the substructure search functionality of FoldamerDB.

    • Conceptual Definition: The Tanimoto coefficient quantifies the similarity between two chemical fingerprints, which are binary representations of molecular structures. It measures the degree of overlap between the structural features present in a query molecule and a molecule in the database. A higher Tanimoto coefficient indicates greater structural similarity.
    • Mathematical Formula: $ J ( A , B ) = { \frac { | A \cap B | } { | A \cup B | } } $
    • Symbol Explanation:
      • J(A, B): The Tanimoto coefficient (also known as the Jaccard index) between two fingerprint sets AA and BB.
      • AA: The set of fingerprint bits representing the query molecule.
      • BB: The set of fingerprint bits representing a hit molecule from the database.
      • AB|A \cap B|: The cardinality (number of elements) of the intersection of sets AA and BB, which corresponds to the number of structural features common to both the query and the hit molecule.
      • AB|A \cup B|: The cardinality of the union of sets AA and BB, which corresponds to the total number of unique structural features present in either the query or the hit molecule.
  • Qualitative Metrics (implied for database design): While not explicitly stated as metrics, the paper emphasizes several qualitative aspects of the database's design and utility:

    • User-friendliness: Assessed by the intuitive navigation, clutter-free interface, and detailed help page.
    • Responsiveness: Compatibility with different screen sizes (mobile-first design).
    • Comprehensiveness: The number of entries, activities, and detailed information provided per entry.
    • Manual curation and annotation: Ensuring data quality and relevance.
    • Open-source and freely available: Accessibility to the scientific community.

5.3. Baselines

Since FoldamerDB is presented as the first open-source, comprehensive database specifically for foldamers, there are no direct competing foldamer databases mentioned as baselines. The implicit baseline is the previous state of scattered information across scientific literature and general chemical/biological databases.

The paper argues that existing resources like PubChem, ChEMBL, CSD, and PDB, while valuable, do not offer the specialized focus, foldamer-specific curation, and integrated information that FoldamerDB provides. Thus, the database is designed to fill a unique niche rather than outperform existing, directly comparable tools. The need for such a database is justified by the success of specialized databases in related fields, such as CAMP and CAMPR3 for antimicrobial peptides, which serve as conceptual "baselines" demonstrating the utility of focused resources.

6. Results & Analysis

6.1. Core Results Analysis

The primary "result" of this paper is the successful development and deployment of FoldamerDB, a novel web-based resource that addresses a significant gap in foldamer research. The database centralizes and curates a vast amount of foldamer-specific information, making it readily accessible to the scientific community.

The key aspects demonstrating the effectiveness and utility of FoldamerDB are:

  • Comprehensive Content: The database successfully compiled information on over 1319 peptidic foldamer species and 1018 biological activities from more than 160 research papers. This extensive collection represents a substantial effort in manual curation and data integration, providing a rich, unified resource that was previously unavailable. Of particular note, 166 entries include experimental crystal structures, which are crucial for understanding the precise 3D conformations of foldamers.

  • Detailed Information Per Entry: Each foldamer entry in FoldamerDB is richly annotated, providing 2D and interactive 3D models, molecular properties (LogP, H-bond donors/acceptors, rotatable bonds, PSA), chemical identifiers (SMILES, InchiKey), structural determination methods (NMR, X-ray crystallography with links to CSD and PDB), external database IDs, applications, biological activities, and bibliographic references. This level of detail empowers researchers to gain a holistic understanding of each foldamer.

  • Intuitive and Functional User Interface: The web interface is designed to be clutter-free, user-friendly, and responsive, adapting to various screen sizes. This is crucial for broad accessibility and usability.

    • Search Capabilities: The search page (Figure 3E) offers flexible simple search options (by ID, application, author, journal, etc.) and a powerful substructure search feature (Figure 4). The substructure search allows users to draw a molecule using JSME and find similar foldamers based on Tanimoto coefficients, directly supporting computational design efforts.
    • Browsing Options: Users can easily browse foldamers (Figure 3B), articles (Figure 3C), structures (those with crystal structures), and activities (Figure 3D), providing multiple entry points to explore the data.
    • Single Foldamer View: The single foldamer view page (Figure 3A, 3F) is particularly effective, presenting all gathered information in a structured manner, including interactive 3D models, enabling in-depth analysis of individual compounds.
  • Addressing a Critical Gap: The very existence of FoldamerDB successfully fills the identified void in specialized foldamer databases, positioning it as a foundational resource for the burgeoning foldamer field. Its open-source nature further ensures wide adoption and utility.

  • Enabling Future Research: By centralizing and structuring foldamer data, FoldamerDB lays the groundwork for advanced computational studies, particularly in machine learning for foldamer design and property prediction.

    The analysis of foldamer types (Figure 1B) within the database also reveals the current landscape of research in the field, showing the prevalence of α/β-peptides and β-peptides. This provides valuable meta-information for researchers on active areas of foldamer synthesis and study.

6.2. Data Presentation (Tables)

The paper does not present explicit results in the form of comparative tables against baselines in the traditional sense, as it introduces a new database. However, it implicitly presents "results" through the description of its content and features.

The following illustrates an example of a substructure search and its output as described in the paper, which serves as a demonstration of the database's functionality rather than a quantitative result table:

Example of Substructure Search Output

The paper provides an example of a substructure search using a query molecule (Figure 4A). The output of such a search would typically list matching foldamers and their Tanimoto coefficients, indicating similarity to the query.

该图像是一个界面展示,左侧为分子结构绘制工具,右侧展示与所绘分子匹配的折叠肽的信息,包括其FoldDB ID和相似度。用户可以通过该工具搜索不同的肽。
该图像是一个界面展示,左侧为分子结构绘制工具,右侧展示与所绘分子匹配的折叠肽的信息,包括其FoldDB ID和相似度。用户可以通过该工具搜索不同的肽。

Figure 4. between the query and hit molecules is measured in terms of Tanimoto index, which ranges from 0 to 1.

The SMILES string for the example query molecule (Figure 4A) is: CC(C)C/C@@HJ(CC(=O)N/C@@HJ(CO)CC(=O)N/C@@HJ(CC(C)C)CC(=O)N/C@HJICNC/C@@HJIC(=O)N/C@@HJ(CC(C)C)CC(=O)N/C@@HJ(CO)CC(=O)N/C@HJICNC/C@@HJIC(N)=O)NC(=O)C/C@HJ(CC(C)C)NC(=O)/C@HJICNC/C@@HJINC(=O)C/C@HJ(CC(C)C)NC(=O)/C@HJICCC/C@@HJINC(=O)/C@HJICCC/C@@HJINCC(C)C/C@@HJ(CC(=O)N/C@@HJ(CO)CC(=O)N/C@@HJ(CC(C)C)CC(=O)N/C@HJICNC/C@@HJIC(=O)N/C@@HJ(CC(C)C)CC(=O)N/C@@HJ(CO)CC(=O)N/C@HJICNC/C@@HJIC(N)=O)NC(=O)C/C@HJ(CC(C)C)NC(=O)/C@HJICNC/C@@HJINC(=O)C/C@HJ(CC(C)C)NC(=O)/C@HJICCC/C@@HJINC(=O)/C@HJICCC/C@@HJIN

When this query is submitted, FoldamerDB identifies entries that share structural similarity. Figure 4B illustrates a partial output of such a search, showing the FoldDB ID of the hit molecules and their respective Tanimoto coefficients (ranging from 0 to 1). A higher Tanimoto coefficient indicates greater similarity between the hit molecule and the query molecule. This allows users to quickly identify foldamers that are structurally related to their molecule of interest.

The following are the results from Figure 4B of the original paper, illustrating the output format for a substructure search:

FoldDB ID Chemical Name Source Tanimoto Distance
170 Aib foldamer PMID:26613945 0.44
594 β-peptide PMID:26613945 0.43
792 β-peptide PMID:26613945 0.43
1201 β-peptide PMID:26613945 0.43
566 α/β-peptide PMID:26613945 0.42

This table demonstrates the search's ability to retrieve foldamers with varying degrees of similarity to the query and links them back to their source publications.

6.3. Ablation Studies / Parameter Analysis

The paper, being a description of a newly developed database and its features, does not include ablation studies or parameter analysis in the conventional sense (i.e., breaking down a model or tuning its hyperparameters). Such analyses are typical for algorithmic or model-centric research papers. Instead, the "analysis" in this context refers to the categorization and statistical overview of the data contained within FoldamerDB and the demonstration of its functional capabilities.

The primary "analysis" presented is the distribution of foldamer types (Figure 1B), which provides insights into the database's content and the general landscape of foldamer research at the time of publication. This is an overview of the collected data rather than an experimental evaluation of the database's underlying components.

7. Conclusion & Reflections

7.1. Conclusion Summary

The paper successfully introduces FoldamerDB, an innovative and much-needed open-source database dedicated to peptidic foldamers. It effectively bridges a significant gap in publicly available resources by centralizing, meticulously curating, and annotating comprehensive information on foldamer sequences, structures, and biological activities. With over 1319 foldamer entries and 1018 associated activities gathered from more than 160 research articles, FoldamerDB provides a rich dataset for the scientific community. The database's user-friendly, responsive web interface, equipped with robust search (including substructure search via Tanimoto coefficient) and browsing functionalities, ensures easy access and utility. By offering detailed molecular properties, 2D and 3D models, and links to external structural databases, FoldamerDB is poised to be a foundational tool for diverse scientific groups, from synthetic chemists to biologists, facilitating novel design projects, particularly those employing machine learning techniques in the rapidly evolving field of foldamer research.

7.2. Limitations & Future Work

The authors acknowledge that FoldamerDB is a "first milestone" and has potential for further expansion. The primary limitation mentioned is that the current version focuses predominantly on peptidic compounds, with a majority of entries being mixedβpeptidesmixed β-peptides—the largest subtype produced until then.

Based on this, the authors outline clear directions for future work:

  • Expansion to Exotic Foldamer Types: The main goal for the future is to expand FoldamerDB to include information about a broader range of foldamer types beyond peptidic foldamers, specifically mentioning aromatic oligoamides as an example. This would make the database even more comprehensive for the entire foldamer field.
  • Community Contribution: To achieve this expansion and ensure the database remains up-to-date, the authors actively encourage the scientific community to contribute new data to the FoldamerDB project. A feedback page and direct contact options are provided for this purpose, with a review process in place to maintain data quality.

7.3. Personal Insights & Critique

FoldamerDB represents a crucial development for the foldamer research community. My personal insights and critique are as follows:

  • Importance for Emerging Fields: The paper highlights the critical need for specialized databases as scientific fields mature. Foldamer chemistry, though relatively young, has generated substantial data. Without such a centralized resource, valuable information remains siloed in publications, hindering systematic analysis and computational design. FoldamerDB sets an excellent example of how to consolidate knowledge in a niche but growing area.
  • Value of Manual Curation: The emphasis on manual curation is a strong point. While automated data extraction can offer scale, manual annotation by experts ensures the quality, accuracy, and foldamer-specific context of the entries, which is vital for building trust and reliability in the data, especially for a relatively complex and diverse class of molecules.
  • Enabling Computational Design: The integration of features like substructure search with Tanimoto coefficient and the structured availability of molecular properties directly supports the application of chemoinformatics and machine learning for de novo foldamer design and property prediction. This positions FoldamerDB as more than just a repository; it's a computational enabler. This is particularly valuable as machine learning models require large, well-structured datasets for training.
  • Sustainability and Community Engagement: The call for community contributions is a pragmatic approach to ensure the long-term sustainability and growth of the database. However, manual curation of user-submitted data can be resource-intensive. The paper doesn't detail the mechanisms or capacity for reviewing potentially large volumes of user contributions, which could become a bottleneck.
  • Potential for Integration: While FoldamerDB provides links to CSD and PDB, future enhancements could explore deeper integration with other relevant resources (e.g., more direct links to bioactivity assays in ChEMBL or PubChem where applicable) or even virtual screening tools.
  • Unverified Assumptions/Areas for Improvement:
    • Data Completeness: While impressive, the database might not be exhaustive of all published foldamers. Ongoing manual curation is a continuous challenge.

    • Scope Definition: The "peptidic" focus, while justified by prevalence, limits the database's reach for other exciting foldamer types like aromatic oligoamides. The stated future work addresses this, but the initial scope could be seen as a limitation for researchers working outside the peptidic domain.

    • User Analytics: The paper doesn't discuss any user analytics or feedback loop for improving the database interface or content prioritization based on actual user behavior. Understanding what foldamer types are most searched or which features are most used could guide future development.

    • Definition of "Foldamer": The inclusion criterion of "described as 'foldamer' or shown experimentally to fold into a specific 3D structure" is good. However, the definition of "well-defined 3D conformation" can sometimes be subjective or context-dependent in the literature, which manual curation helps mitigate but doesn't entirely remove ambiguity.

      Overall, FoldamerDB is a highly valuable contribution that will undoubtedly accelerate research in foldamer chemistry and biology, providing a foundation for future discoveries and applications.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.