Papers
Sign in to view your remaining parses.
Tag Filter
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Published:5/1/2024
Motion Generation ModelControllable Motion GenerationMotion Latent Consistency ModelReal-Time Motion GenerationText-Conditioned Motion Control
This paper introduces , a framework that significantly improves runtime efficiency in textconditioned motion generation using a motion latent consistency model, enabling realtime, controllable human motion generation with explicit control signals and highquality out
02
Qwen2 Technical Report
Published:7/15/2024
Qwen2 Large Language Model SeriesInstruction-Tuned Language ModelMultilingual ProficiencyOpen-Source Model WeightsPerformance Benchmarking
The Qwen2 Technical Report presents the Qwen2 series, showcasing foundational and instructiontuned models with parameters ranging from 0.5 to 72 billion, surpassing most opensource models, including Qwen1.5, and demonstrating strong multilingual capabilities in about 30 languag
02
Mastering Diverse Domains through World Models
Published:1/11/2023
DreamerV3 AlgorithmMultitask Reinforcement LearningSelf-Imagination Behavior OptimizationRobust Learning TechniquesOpen-World Control Problems
The study introduces DreamerV3, a general algorithm outperforming specialized methods in over 150 tasks using a single configuration. By learning environment models and imagining future scenarios, it successfully collected diamonds in Minecraft from scratch, demonstrating stable
03
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Published:10/14/2020
Memory-Augmented TransformerSequence Modeling OptimizationExternal Dynamic Memory NetworkLong Sequence ProcessingMemory Replay Back-Propagation
Memformer is an augmented Transformer that addresses efficiency issues in long sequence modeling by using external dynamic memory. It achieves linear time complexity and constant memory space complexity while reducing memory usage by 8.1x and improving speed by 3.2x during infere
03
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Published:6/18/2024
Vision Compression with Large Language ModelsAttention-Based Vision Instruction TuningTemporal Compression for Video FramesComputational Efficiency Improvement for Multimodal TasksContext Utilization in Vision-Language Models
VoCoLLaMA introduces a method for vision compression using LLMs, achieving a 576× compression ratio with a minimal performance loss, reducing FLOPs by 94.8% during inference.
04
A generalized e-value feature detection method with FDR control at multiple resolutions
Published:9/25/2024
Multiresolution Feature DetectionFalse Discovery Rate ControlStabilized Flexible e-Filter ProcedureSpatial Genome-Wide Association StudiesSimulation Studies
This paper introduces the Stabilized Flexible eFilter Procedure (SFEFP) for detecting significant features and groups across multiple resolutions while controlling the false discovery rate (FDR). SFEFP outperforms existing methods by flexibly integrating detection processes and
00
Model-Free Assessment of Simulator Fidelity via Quantile Curves
Published:12/5/2025
Model-Free Assessment of Simulator FidelityQuantile Curve MethodComparison of Simulated and Ground Truth DistributionsLLM Simulation EvaluationOutput Uncertainty Handling
This paper introduces a modelfree method using quantile functions to assess discrepancies between complex system simulations and ground truths, focusing on output uncertainty. It supports confidence intervals, riskaware summaries, and simulator performance comparisons, validate
03
GraphBench: Next-generation graph learning benchmarking
Published:12/4/2025
Graph Learning BenchmarkingGraph Neural NetworksMessage-Passing Neural NetworksStandardized Evaluation ProtocolsGenerative Graph Learning
GraphBench is a comprehensive benchmarking suite for graph learning, addressing fragmented practices. It covers various tasks and provides standardized evaluation protocols, dataset splits, and hyperparameter tuning, benchmarking with messagepassing networks and graph transforme
02
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Published:5/7/2025
Flexible Action ControlAction Transfer from Reference VideosSpatial Structure AdaptationFrequency-Aware Action ExtractionVideo Generation and Customization
FlexiAct is a novel method for flexible action control in heterogeneous scenarios, enabling action transfer to arbitrary target images while maintaining identity consistency. It utilizes a lightweight adapter for spatial adaptation and frequencyaware extraction for effective mot
03
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
Published:9/10/2025
Diffusion-based Physically Plausible ReconstructionHuman-Object Interaction ReconstructionScore-Guided SamplingContact-Driven Iterative RefinementHuman-Object Interaction Optimization Methods
ScoreHOI is a novel optimizer that leverages scoreguided diffusion for physically plausible humanobject interaction reconstruction from a single image. By incorporating diffusion priors and physical constraints, it improves reconstruction quality and contact plausibility, showi
07
A Survey of Generative Recommendation from a Tri-Decoupled Perspective: Tokenization, Architecture, and Optimization
Generative Recommendation SystemsModel Optimization MethodsRecommendation System ArchitectureTokenization Techniques
This survey explores three key aspects of generative recommendation systems: tokenization, architecture, and optimization, highlighting how generative methods mitigate error propagation, enhance hardware utilization, and extend beyond local user behavior, while tracing the evolut
024
Unsupervised Degradation Representation Learning for Unpaired Restoration of Images and Point Clouds
Published:10/30/2024
Unpaired Image and Point Cloud RestorationDegradation Representation LearningUnsupervised Restoration MethodsDegradation-Aware ConvolutionsLow-Quality Data Restoration
This paper presents an unsupervised degradation representation learning scheme to address challenges in unpaired restoration of images and point clouds, utilizing degradationaware convolutions for flexible adaptation, ultimately establishing a generic framework that demonstrates
02
Tongyi DeepResearch Technical Report
Published:10/29/2025
Agentic Large Language Model for Deep ResearchLong-Horizon Information-Seeking TasksAutomated Data Synthesis PipelineDeep Research BenchmarkingEnhanced Autonomous Research Capability
The report presents Tongyi DeepResearch, an agentic large language model designed for longhorizon research tasks. It employs an endtoend training framework combining mid and posttraining to foster autonomous capabilities, achieving stateoftheart performance across various
02
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Published:11/29/2025
Multimodal Understanding and GenerationUnified Representation ModelVector Quantization AutoencodersVision Foundation ModelsSelf-Distillation Constraints
VQRAE is a Vector Quantization autoencoder addressing unified representation for multimodal understanding, generation, and reconstruction. It utilizes a single tokenizer for continuous semantic features and discrete tokens, ensuring minimal semantic information loss while demonst
07
UniSearch: Rethinking Search System with a Unified Generative Architecture
Published:9/9/2025
Unified Generative Search FrameworkIntegration of Search Generator and Video EncoderSearch Preference Optimization MethodShort Video Search SystemGenerative Recommendation Systems
UniSearch introduces a unified generative search framework for Kuaishou, replacing the traditional cascaded architecture. By integrating a Search Generator and Video Encoder, it achieves endtoend optimization, addressing objective inconsistency and limited generalization, thus
09
NeuDATool: An Open Source Neutron Data Analysis Tools, Supporting GPU Hardware Acceleration, and Across-computer Cluster Nodes Parallel
Published:4/12/2019
Open Source Neutron Data Analysis ToolGPU Hardware AccelerationCluster Node ParallelismNeutron Scattering Data AnalysisMicrostructure Reconstruction
NeuDATool is an opensource neutron data analysis tool that enhances speed and scalability over traditional EPSR. Written in C, it supports GPU acceleration and parallelism across cluster nodes, achieving over 400 times speedup compared to CPU, effectively reconstructing the mi
02
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
Published:12/6/2024
Momentum-Based Self-DistillationLarge Scene Reconstruction3D Gaussian SplattingIntegration of Implicit and Explicit FeaturesDynamic Block Weighting
MomentumGS introduces a novel method leveraging momentumbased selfdistillation for largescale scene reconstruction, addressing memory consumption issues while ensuring block consistency and accuracy by dynamically adjusting weights based on reconstruction quality.
04
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Published:3/15/2025
Multimodal Representation LearningCross-Modal Alignment FrameworkHierarchical Alignment MethodGaussian Mixture ModelingMultimodal Transformer
DecAlign is introduced as a hierarchical crossmodal alignment framework that decouples multimodal representations into unique and common features. It effectively addresses heterogeneity and consistency across modalities, outperforming existing methods on benchmark datasets.
03
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation
Published:5/10/2025
Text-to-Image GenerationHierarchical Cross-Model AlignmentMultimodal GenerationMS-COCO DatasetDiffusion Models
The paper introduces the Hierarchical CrossModal Alignment (HCMA) framework, addressing the conflict between semantic fidelity and spatial control in texttoimage generation. HCMA combines global and local alignment modules to achieve highquality results in complex scenes, sur
02
ATOMAS: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation
Molecule-Text Cross-Modal Representation LearningHierarchical Adaptive Alignment ModelSMILES String Representation LearningMolecule Understanding and GenerationCross-Modal Fragment Correspondence Learning
Atomas is a hierarchical framework for joint learning of SMILES and text representations, using a Hierarchical Adaptive Alignment model to capture finegrained fragment correspondences. It demonstrates superior performance in various tasks, showcasing its effectiveness and versat
02
……