How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
Haotong Qin3
刊名Machine Intelligence Research
2023
卷号20期号:5页码:605-613
关键词Google Bard, multi-modal understanding, visual comprehension, large language models, conversational AI, chatbot
ISSN号2731-538X
DOI10.1007/s11633-023-1469-x
英文摘要Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard's impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Gener ative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard's performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine grained visual data. Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand.
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/52439]  
专题自动化研究所_学术期刊_International Journal of Automation and Computing
作者单位1.Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi 999041, UAE
2.College of Engineering, Computing & Cybernetics, Australian National University, Canberra 8105, Australia
3.Computer Vision Lab (CVL), ETH Zürich, Zürich 8001, Switzerland
推荐引用方式
GB/T 7714
Haotong Qin. How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges[J]. Machine Intelligence Research,2023,20(5):605-613.
APA Haotong Qin.(2023).How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges.Machine Intelligence Research,20(5),605-613.
MLA Haotong Qin."How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges".Machine Intelligence Research 20.5(2023):605-613.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace