Papers
Sign in to view your remaining parses.
Tag Filter
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation
Published:6/22/2024
Geometry-Aware Large Reconstruction Model3D Gaussian Generation3D-aware Transformer StructureSparse 3D Structure OptimizationDeformable Cross-Attention Mechanism
This study introduces GeoLRM, a geometryaware large reconstruction model that efficiently generates highquality 3D assets with 512k Gaussians from 21 input images using only 11 GB of GPU memory. It addresses limitations of existing methods by utilizing a novel 3Daware transfor
01
Wide-FOV 3D Pancake VR Enabled by a Light Field Display Engine
Light Field Display EngineWide-FOV 3D Virtual RealityComputational Focus CuesMicro-LCDOptical Pancake Design
This paper presents a novel true3D Pancake VR system using a light field display engine and computational focus cues, achieving highresolution images. It addresses FOV reduction due to aberrations with a telecentric path, experimentally confirming clear 3D images with a 68.6de
03
The differential influence of Achievement Motivation on Subjective Well-being and the moderating role of Self-control
Published:9/27/2024
Achievement Motivation and Subjective Well-BeingModerating Role of Self-ControlPsychological Health Studies in College StudentsSelf-Management and Well-Being
This study surveyed 1,017 Chinese college students to explore the relationship between achievement motivation and subjective wellbeing, revealing that selfcontrol significantly moderates this relationship. High selfcontrol enhances positive impacts of success motivation and mi
01
Deciphering the impact of machine learning on education: Insights from a bibliometric analysis using bibliometrix R-package
Published:5/6/2024
Impact Analysis of Machine Learning in EducationBibliometrix Statistical Analysis MethodInterdisciplinary Research on Machine Learning and EducationTrends and Patterns in Educational ResearchChallenges and Ethical Considerations in ML Education Applicatio
This study uses bibliometric analysis to explore machine learning's impact on education, revealing its transformative potential for teaching methods. Analyzing 970 articles from 2000 to 2023 identifies growth patterns and key contributors, providing a comprehensive roadmap for in
02
A multifactorial model of intrinsic / environmental motivators, personal traits and their combined influences on math performance in elementary school
Achievement Goals Model for Math PerformanceInfluence of Self-Efficacy and InterestRole of Environmental Factors in Learning MotivationHolistic Multifactorial Path AnalysisElementary School Math Learning Research
This study develops a comprehensive multifactorial path analysis model to explore the influences of intrinsic and environmental motivators and personality traits on math performance among elementary students. Results from 762 Cypriot students highlight selfefficacy and interest
02
Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation
Published:9/11/2025
Multimodal Recommendation SystemsFine-Grained Cross-Modal Association ModelingBidirectional Attention MechanismGlobal Distribution Consistency RegularizationDilated Refinement Attention Module
This study presents MambaRec, an innovative multimodal recommendation framework addressing finegrained crossmodal association modeling and global consistency issues through attentionguided learning. Its core contribution is the Dilated Refinement Attention Module, enhancing fu
02
Multimodal Generative Recommendation for Fusing Semantic and Collaborative Signals
Published:10/8/2025
Multimodal Generative Recommendation SystemFusion of Collaborative and Semantic SignalsSelf-Supervised Quantization LearningSequential Recommender SystemsDINO Framework
The paper introduces MSCGRec, a generative recommendation system that addresses limitations in current sequential recommenders by integrating multiple semantic modalities and collaborative features. Empirical results show superior performance on three realworld datasets, validat
01
MorphQPV: Exploiting Isomorphism in Quantum Programs to Facilitate Confident Verification
Published:4/24/2024
Quantum Program VerificationIsomorphism MethodConfident Assertion-Based Verification MethodConstraint Optimization ProblemQuantum Algorithm Debugging
MorphQPV is a confident assertionbased verification method for quantum programs, leveraging isomorphism to establish structural preservation relations among runtime states. It transforms verification into a constraint optimization problem, significantly improving efficiency and
01
ModRWKV: Transformer Multimodality in Linear Time
Published:11/1/2025
ModRWKV Multimodal FrameworkRWKV ArchitectureLinear-Time TransformerMultimodal Large Language ModelsDynamically Adaptable Heterogeneous Modality Encoders
This study introduces ModRWKV, a framework based on RWKV architecture that achieves multimodal processing with linear time complexity, outperforming traditional quadraticcomplexity Transformer models. It balances performance and computational efficiency for multisource informat
09
Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation
Published:9/11/2024
Generative Recommendation SystemsMulti-Aspect Semantic TokenizationText-Based Reconstruction TasksLong-Tailed Recommendation IssuesCold-Start Recommendation
This paper introduces LAMIA, a novel multiaspect semantic tokenization framework that enhances generative recommendation systems. Unlike traditional methods, it learns independent embeddings capturing multiple facets of items, significantly improving recommendation accuracy for
01
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Published:12/16/2017
Tacotron 2 Speech SynthesisWaveNet VocoderMel Spectrogram PredictionSequence-to-Sequence Feature PredictionNeural Network Speech Synthesis Architecture
This paper presents Tacotron 2, a neural network architecture for direct texttospeech synthesis. It features a sequencetosequence network predicting mel spectrograms and a modified WaveNet as a vocoder. The model achieves a mean opinion score of 4.53, comparable to profession
01
Tacotron: Towards End-to-End Speech Synthesis
Published:3/30/2017
End-to-End Speech Synthesis ModelTacotron ModelSequence-to-Sequence LearningText-to-Speech SynthesisGenerative Models in NLP
Tacotron is an endtoend texttospeech model that synthesizes speech directly from characters, simplifying complex traditional TTS systems. Trained from scratch, it scores 3.82 in mean opinion, outperforming existing systems in naturalness and offering faster generation speeds.
01
WaveNet: A Generative Model for Raw Audio
Published:9/13/2016
Audio Generation ModelWaveNet ArchitectureText-to-Speech SynthesisAutoregressive ModelingMusic Generation
WaveNet is introduced as a deep neural network for raw audio generation, featuring probabilistic and autoregressive properties. It excels in texttospeech tasks, surpassing existing systems in naturalness, and shows high realism in music generation while also achieving promising
01
End-to-End Speech Recognition Contextualization with Large Language Models
Published:9/20/2023
LLM-based Speech RecognitionText Context Augmented Speech RecognitionMixed-Modal Language ModelingDecoder-Only Speech RecognitionLow-Parameter Adapter Method
The paper introduces a novel speech recognition contextualization method using Large Language Models, reframing ASR as mixedmodal language modeling. Adding textual context reduces word error rate by 6%, achieving a 7.5% improvement over the baseline system.
01
End-to-End Speech Recognition: A Survey
Published:3/3/2023
End-to-End Speech Recognition ArchitecturesDeep Learning in Speech RecognitionAll-Neural ASR ModelsTaxonomy of Automatic Speech Recognition ModelsTraining and Decoding of Speech Recognition Models
This survey reviews advancements in endtoend automatic speech recognition (ASR) models, highlighting deep learning's impact on reducing word error rates. It provides a taxonomy of E2E models, discusses their properties, relates them to traditional hidden Markov models, and cove
02
Fun-ASR Technical Report
Published:9/16/2025
LLM-based Automatic Speech Recognition SystemOptimizations for Practical Deployment in Speech RecognitionReinforcement Learning Applications in Speech RecognitionData-Driven Large-Scale Speech RecognitionStreaming Capability in Speech Recognition
FunASR integrates large data, model capacity, and deep LLM integration, optimizing through reinforcement learning for realworld challenges. It achieves stateoftheart performance on industrial datasets, demonstrating effectiveness in streaming, noise robustness, and codeswit
02
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Published:7/5/2024
Multilingual Speech Recognition and GenerationEmotion Recognition in SpeechSpeech-to-Speech TranslationZero-Shot Voice CloningNatural Voice Interaction between Humans and LLMs
This report presents FunAudioLLM, a model family enhancing natural voice interaction between humans and LLMs. It features SenseVoice for multilingual speech and emotion recognition, and CosyVoice for natural voice generation, supporting applications like speech translation and em
02
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
Published:1/1/1990
Hidden Markov ModelsStatistical Modeling in Speech RecognitionApplications of Markov Source ModelsReview of Statistical Methods
This tutorial reviews the theory of Hidden Markov Models (HMMs) and their applications in speech recognition, highlighting their rich mathematical structure and effectiveness in practical applications.
02
Jenga: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity
Large Language Model Fine-TuningLong-Context ModelingSparse Attention Mechanism
Jenga is a novel LLM finetuning system that optimizes activation memory usage in longcontext applications using Contextual Token Sparsity. It employs token elimination, pattern prediction, and kernel optimization, achieving up to 1.93x memory reduction and 1.36x acceleration ov
02
Objaverse-XL: A Universe of 10M+ 3D Objects
Published:7/11/2023
Objaverse-XL Dataset3D Vision TasksMulti-View Rendered ImagesZero-Shot Generalization CapabilityAcquisition of High-Quality 3D Data
The paper introduces ObjaverseXL, a dataset of over 10 million 3D objects, addressing the scarcity of highquality data in 3D vision tasks. Training on 100 million multiview images achieved significant zeroshot generalization, promoting innovation in 3D vision.
01
…