暂无搜索历史
1.MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasonin...
CV - 计算机视觉 | ML - 机器学习 | RL - 强化学习 | NLP 自然语言处理
1.Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangl...
1.Automated Generation of Challenging Multiple-Choice Questions for Vision Langu...
1.VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
作者:Ferran Alet, Clement Gehring, Tomás Lozano-Pérez, Kenji Kawaguchi, Joshua B. ...
作者:Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young...
1.On Calibration in Multi-Distribution Learning
1.Thinking in Space: How Multimodal Large Language Models See, Remember, and Rec...
1.REDUCIO! Generating 1024×1024 Video within 16 Seconds using Extremely Compress...
1.DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
1.Adaptive Deviation Learning for Visual Anomaly Detection with Data Contaminati...
1.Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Mult...
暂未填写公司和职称
暂未填写学校和专业
暂未填写个人网址