机器视觉问答开源项目介绍

CreateAMind

发布于 2018-07-20 16:45:48

8860

发布于 2018-07-20 16:45:48

文章被收录于专栏：CreateAMind

keras中文doc之三结尾给出了一个非常简单的vqa视觉问答的程序demo，我们今天看一个复杂的tensorflow版本的VQA。

https://github.com/JamesChuanggg/VQA-tensorflow

Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering

此tensorflow版本的VQA精度达到原torch程序版本：

This current code can get 58.16 on Open-Ended and 63.09 on Multiple-Choice on test-standard split.

效果：

但是代码只有400多行，喜欢的朋友可以看起来。

相对于此版本的VQA，改进版本的VQA增加了注意力及层级关系

https://github.com/jiasenlu/HieCoAttenVQA

Hierarchical Question-Image Co-Attention for Visual Question Answering

注意力效果如图：

注意力在视频中的应用可以参考：

https://github.com/tsenghungchen/SA-tensorflow

阅读原文看完整代码。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-09-30，如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度