暂无搜索历史
CV - 计算机视觉 | ML - 机器学习 | RL - 强化学习 | NLP 自然语言处理
1.Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Prefe...
1.MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasonin...
1.Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangl...
1.Automated Generation of Challenging Multiple-Choice Questions for Vision Langu...
1.VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
作者:Ferran Alet, Clement Gehring, Tomás Lozano-Pérez, Kenji Kawaguchi, Joshua B. ...
作者:Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young...
1.On Calibration in Multi-Distribution Learning
1.Thinking in Space: How Multimodal Large Language Models See, Remember, and Rec...
暂未填写公司和职称
暂未填写学校和专业
暂未填写个人网址