ESCAPE: A Standardized Benchmark for Multilabel Antimicrobial Peptide Classification

Universidad de Los Andes, Colombia
The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
Graphical Abstract

ESCAPE provides a unified dataset, tasks, and evaluation for multilabel AMP classification.


Expanded Standardized Collection for Antimicrobial Peptide Evaluation (ESCAPE) is an experimental framework for multilabel antimicrobial peptide classification. It combines a large-scale curated dataset, a benchmark for evaluating models, and a transformer-based baseline that integrates both sequence and structural information.


Abstract

Antimicrobial peptides have emerged as promising molecules to combat antimicrobial resistance. However, fragmented datasets, inconsistent annotations, and the lack of standardized benchmarks hinder computational approaches and slow down the discovery of new candidates. To address these challenges, we present the Expanded Standardized Collection for Antimicrobial Peptide Evaluation (ESCAPE), an experimental framework integrating over 80,000 peptides from 27 validated repositories. Our dataset separates antimicrobial peptides from negative sequences and incorporates their functional annotations into a biologically coherent multilabel hierarchy, capturing activities across antibacterial, antifungal, antiviral, and antiparasitic classes. Building on ESCAPE, we propose a transformer-based model that leverages sequence and structural information to predict multiple functional activities of peptides. Our method achieves up to a 2.56% relative average improvement in mean Average Precision over the second-best method adapted for this task, establishing a new state of the art in multilabel peptide classification. ESCAPE provides a comprehensive and reproducible evaluation framework to advance AI-driven antimicrobial peptide research. The ESCAPE dataset is available HERE and the baseline code HERE.

Overview Video


ESCAPE Database

The ESCAPE Dataset integrates over 80,000 peptide sequences from 27 validated public repositories to address critical limitations in existing AMP resources, including data fragmentation, inconsistent annotations, and limited functional coverage. It distinguishes antimicrobial peptides from negative sequences and organizes their functional annotations into a biologically meaningful multilabel hierarchy, covering antibacterial, antifungal, antiviral, and antiparasitic activities. The dataset comprises 21,409 experimentally validated AMPs and 60,950 non-AMPs filtered from unrelated sources.

ESCAPE dataset statistics

ESCAPE integrates peptides from 27 curated databases across antibacterial, antifungal, antiviral, and antiparasitic functions, reflecting mechanisms such as membrane disruption and inhibition of cell-wall biosynthesis. Non-AMP sequences are selected from UniProt and curated negatives following a TransImbAMP-style filter, excluding antimicrobial-related keywords to build a high-confidence negative set. The figure summarizes activity distribution, sequence-length profiles for AMPs vs. non-AMPs, and split counts (train/validation/test).

The ESCAPE dataset is available on Harvard Dataverse.


ESCAPE Benchmark

The ESCAPE Benchmark enforces a unified multilabel evaluation protocol on a shared label space covering antibacterial, antifungal, antiviral, and antiparasitic activities, with fixed public train, validation, and test splits. Each method is trained under the same preprocessing and split definitions, and all reported results correspond to the held-out test set.

AMP classification methods fall into two main categories. Sequence-based models learn directly from amino acid sequences, such as AMPlify (Bi-LSTM with attention), TransImbAMP, and AMP-BERT (pretrained protein language models). Feature-augmented models incorporate computed descriptors, including physicochemical and structural features; examples include amPEPpy (CTD features with Random Forest), AMPs-Net (graph neural networks), PEP-Net, and AVP-IFT (contrastive Transformer).

To assess robustness, each method is trained with multiple random seeds differing in initialization and shuffling, and we report the mean ± standard deviation across seeds. The ESCAPE Baseline extends prior designs by jointly encoding peptide sequences and 3D distance maps through bidirectional cross-attention, unifying structural and sequential cues for state-of-the-art multilabel performance.

Method Primary Architecture GitHub Repository F1-score (%) mAP (%)
AMPs-NetGCNGitHub57.7 ± 0.7054.6 ± 0.86
TransImbAMPTransformer-BasedGitHub62.0 ± 0.7064.9 ± 1.11
AMP-BERTBERTGitHub64.7 ± 0.6466.9 ± 1.17
amPEPpyRandom Forest (RF)GitHub66.5 ± 0.3768.5 ± 0.48
PEP-NetTransformer-BasedGitHub65.5 ± 0.6168.4 ± 0.53
AVP-IFTContrastive-Learning + TransformerGitHub66.5 ± 0.5968.8 ± 0.50
AMPlifyBi-LSTM with attention layersGitHub68.5 ± 0.7770.3 ± 0.87
ESCAPE Baseline (ours)Dual-branch transformerGitHub69.8 ± 0.4372.1 ± 0.60

ESCAPE Baseline

The ESCAPE Baseline is a dual-branch transformer that classifies antimicrobial peptides using both sequence and structural information.

ESCAPE baseline architecture

Sequence branch. Amino-acid tokens are embedded and processed by a Transformer encoder to model local and long-range dependencies across the peptide chain.
Structure branch. We derive pairwise distance matrices from available 3D information and tokenize them into patches that a parallel Transformer encoder processes to capture spatial organization.
Multimodal fusion. A bidirectional cross-attention layer lets sequence attend to structure and vice versa, producing fused representations for multilabel classification across antibacterial, antifungal, antiviral, and antiparasitic activities.
Training & evaluation. We use AdamW, standard regularization, and the same folds/metrics defined by the ESCAPE benchmark; ensembling across folds yields the reported scores. In ablations, the multimodal model consistently outperforms sequence-only or structure-only variants on both F1 and mAP, highlighting the complementary value of structural cues.


BibTeX

@article{escape2025,
  author  = {Sebastian Ojeda and Rafael Velasquez and Nicolás Aparicio and Juanita Puentes and Paula Cárdenas and Nicolás Andrade and Gabriel González and Sergio Rincón and Carolina Muñoz-Camargo and Pablo Arbeláez},
  title   = {ESCAPE: A Standardized Benchmark for Multilabel Antimicrobial Peptide Classification},
  journal = {Preprint},
  year    = {2025},
  url     = {https://bcv-uniandes.github.io/escape-wp/}
}