Papers

Sign in to view your remaining parses.
Tag Filter
Privacy-Preserving Action Recognition via Motion Difference Quantization
Published:8/4/2022
Privacy-Preserving Human Action RecognitionMotion Difference QuantizationAdversarial Training OptimizationPrivacy and Security Issues in Computer VisionImage Blurring and Difference Processing
This paper presents BDQ, a privacypreserving encoder for human action recognition, which utilizes blur, difference, and quantization to suppress privacy information while retaining recognition performance, achieving stateoftheart results in experiments across three benchmark
019
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Published:10/12/2025
Vision-Language-Action ModelCross-Embodiment LearningSoft Prompt LearningGeneralist Robotic PlatformsLarge-Scale Heterogeneous Datasets
The paper introduces XVLA, a scalable VisionLanguageAction model utilizing a softprompted Transformer architecture. By integrating learnable embeddings for diverse robot data sources, XVLA achieves stateoftheart performance across simulations and real robots, demonstratin
02
Deciphering the biosynthetic potential of microbial genomes using a BGC language processing neural network model
Published:4/10/2025
Microbial Genomic Biosynthetic Potential AnalysisBiosynthetic Gene Cluster Prediction ModelTransformer-based Gene Location Relationship CaptureUltrahigh-Throughput BGC Screening ToolStudy of Microbial Secondary Metabolites
This study presents BGCProphet, a transformerbased model for predicting and classifying biosynthetic gene clusters in microbial genomes. It enhances efficiency and accuracy, analyzing over 85,000 genomes to reveal BGC distribution patterns and environmental influences, aiding r
03
StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation
Published:11/11/2025
video diffusion modelsReal-Time Interactive Video GenerationStreaming Content CreationLow-Latency Video GenerationMulti-GPU Real-Time Streaming Service
StreamDiffusionV2 is introduced as a streaming system for dynamic and interactive video generation, addressing temporal consistency and low latency issues in live streaming. It integrates SLOaware schedulers and other optimizations for trainingfree realtime service, enhancing
010
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control
Published:1/30/2024
Dynamic Bipedal Robot ControlApplication of Deep Reinforcement LearningRobot Adaptivity and RobustnessDiverse Locomotion SkillsDual-History Control Architecture
This paper develops dynamic locomotion controllers for bipedal robots using deep reinforcement learning, surpassing single skill limitations with a novel dualhistory architecture that enhances adaptivity and robustness. The controllers show superior performance in diverse skills
02
ExBody2: Advanced Expressive Humanoid Whole-Body Control
Published:12/18/2024
Humanoid Whole-Body ControlExpressive Dynamic Motion GenerationMotion Capture-Based Control StrategyKinematic Adaptation Optimization for RobotsWhole-Body Motion Tracking Algorithm
The paper presents ExBody2, an advanced control method enabling humanoid robots to perform expressive wholebody movements while maintaining stability. It employs a training approach based on human motion capture and simulations, addressing tradeoffs between versatility and spec
03
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Social Media Image Deepfake DetectionLarge Multimodal ModelDeepfake Localization and ExplanationDeepfake Detection DatasetImage Authenticity Verification
The SIDA framework utilizes large multimodal models for detecting, localizing, and explaining deepfakes in social media images. It also introduces the SIDSet dataset, comprising 300K diverse synthetic and authentic images with high realism and thorough annotations, enhancing det
011
FoldamerDB: a database of peptidic foldamers
Published:10/17/2019
Foldamer DatabaseAntimicrobial and Anticancer FoldamersBiologically Active FoldamersPublicly Accessible Compound DatabaseStructural and Sequence Information of Foldamers
FoldamerDB is an opensource, fully annotated database of peptidic foldamers, containing information on 1319 species and their biological activities, collected from over 160 papers. The userfriendly interface allows for comprehensive searching and filtering, addressing a gap in
05
Explainable Machine Learning and Deep Learning Models for Predicting TAS2R-Bitter Molecule Interactions
Published:10/9/2025
Explainable Machine Learning ModelsTAS2R-Bitter Molecule Interaction PredictionDeep Learning for Ligand RecognitionG Protein-Coupled Receptor Function ResearchMolecular Characteristics and Drug Design
This study developed explainable machine learning and deep learning models to predict interactions between bitter molecules and TAS2R receptors, enhancing ligand selection and understanding of receptor functions, with significant implications for drug design and disease research.
02
Identifying Sequential Residue Patterns in Bitter and Umami Peptides
Published:11/9/2022
Sequential Pattern Identification in Bitter and Umami PeptidesCoarse-Graining of Peptide Sequence SpaceQuantitative Structure-Activity Relationship for Taste FeaturesAmino Acid Pattern ExtractionSystematic Improvements for Bitter and Umami Peptide Features
This study explores how amino acid sequences in peptides affect taste, introducing a coarsegraining method to systematically identify optimal residue patterns for bitter and umami peptides, showing significant improvements over random and baseline models.
07
A Challenging Benchmark of Anime Style Recognition
Published:6/1/2022
Anime Style Recognition BenchmarkLarge-Scale Anime Style Recognition DatasetCross-Role Anime Style EvaluationAbstract Painting Style LearningTransformer Models for Anime Recognition
This paper presents a challenging benchmark for Anime Style Recognition (ASR), aiming to determine if two images of different characters are from the same work. A large dataset (LSASRD) is introduced, alongside a crossrole evaluation protocol, revealing the need for deeper ASR r
04
Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation
Published:12/19/2024
Multimodal Recommendation SystemsGraph Neural NetworksModality-Independent Receptive FieldsGlobal TransformerUser-Item Graph Modeling
This study presents modalityindependent GNNs to enhance multimodal recommendation performance by utilizing separate GNNs for different modalities. A samplingbased global transformer effectively integrates global information, addressing limitations of existing methods, with supe
12
LITA: LMM-Guided Image-Text Alignment for Art Assessment
Published:12/30/2024
Multimodal Artistic Image Aesthetics AssessmentLMM-Guided Image-Text AlignmentLLaVA Model ApplicationArtistic Style and Aesthetic Semantic AnalysisImage Feature and Text Comment Alignment
To address the growing need for Artistic Image Aesthetics Assessment (AIAA), the authors propose LITA, an LMMguided imagetext alignment model utilizing pretrained LLaVA comments for rich feature extraction. LITA effectively captures artistic style and semantics, outperforming
02
Generation of Clothing Patterns Based on Impressions Using Stable Diffusion
Text-to-Image Generation Based on Visual ImpressionsStable Diffusion Model with Multi-Modal Input ExtensionPersonalized Product GenerationImpression Preservation in Image GenerationAI-Driven Personalization
This paper presents a personalized clothing pattern generation model that integrates visual impressions into the Stable Diffusion architecture through a multimodal input system, demonstrating strong positive correlations in impression metrics between generated and original image
02
Explainable AI for Image Aesthetic Evaluation Using Vision-Language Models
Published:2/3/2025
Vision-Language ModelsExplainable AIImage Aesthetic Evaluation
This study enhances image aesthetic evaluation using visionlanguage models by proposing an interpretable method. It explores feature importance through SHAP analysis and predicts quality scores with LightGBM, demonstrating high correlation with human judgment, thus advancing obj
06
A Multi-modal Large Language Model with Graph-of-Thought for Effective Recommendation
Published:1/1/2025
Multimodal Large Language ModelGraph-of-Thought Prompting TechniquePersonalized Recommendation SystemMultimodal Recommendation TasksUser-Item Interaction Graphs
The GollaRec model integrates a Multimodal Large Language Model and GraphofThought to enhance useritem interaction for effective recommendations, combining visual and textual data. It utilizes textgraph alignment and tuning, outperforming 12 existing models in multimodal tas
05
FiLM: Visual Reasoning with a General Conditioning Layer
Published:9/23/2017
Feature-wise Linear Modulation LayersVisual Reasoning TasksCLEVR BenchmarkMulti-Step ReasoningNeural Network Conditioning Methods
This study introduces FiLM, a general conditioning method that enhances neural network computation. FiLM layers significantly improve visual reasoning, halving error rates on the CLEVR benchmark and demonstrating robustness to architectural changes as well as good fewshot and ze
03
ITMPRec: Intention-based Targeted Multi-round Proactive Recommendation
Published:4/22/2025
Proactive Recommendation SystemsIntention-Based Recommendation MethodMulti-Round Recommendation StrategyLLM-based User Feedback SimulationPersonalized Recommendation Optimization
ITMPRec is a novel intentionbased targeted multiround proactive recommendation method that addresses passive acceptance in personalized systems by selecting target items through prematching, utilizing multiround nudging, and simulating user feedback with an LLM agent, outperf
03
LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics
Published:3/12/2025
LLM Time Series ForecastingPattern and Semantic Learning from Time Series DataMulti-Scale Convolutional Neural NetworkTemporal Dependency ModelingShort- and Long-Term Forecasting
LLMPS is a new framework enhancing large language models for time series forecasting by learning fundamental patterns and meaningful semantics from time series data, utilizing a multiscale convolutional neural network and a timetotext module for improved accuracy.
012
WebMall -- A Multi-Shop Benchmark for Evaluating Web Agents
Published:8/18/2025
LLM-based Online Shopping Performance EvaluationMulti-Shop Comparison-Shopping BenchmarkCross-Shop Task SuiteSimulated Online Shopping BehaviorReal-World Product Offer Dataset
WebMall is a new benchmark for evaluating LLMbased web agents in multishop comparisonshopping scenarios, featuring four simulated shops and 91 tasks that enhance online shopping research by offering authentic product diversity.
02