TY - GEN
T1 - Cloud-IoT Application for Scene Understanding in Assisted Living
T2 - Unleashing the Potential of Image Captioning and Large Language Model (ChatGPT)
AU - Abdal Hafeth, Deema
AU - Lal, Gokul
AU - Al-Khafajiy, Mohammed
AU - Baker, Thar
AU - Kollias, Stefanos
PY - 2024/3/21
Y1 - 2024/3/21
N2 - Vision is a vital sense that plays a pivotal role in our understanding of the world. The majority of our external information is acquired through our visual system, which significantly impacts various aspects of our lives, including mobility, cognitive abilities, access to information, and how we interact with both our surroundings and other individuals. Hence, individuals who need assisted living due to visual challenges are left behind and rely on human-driven image captioning services to make sense of their surroundings. In response to this challenge, we have developed a proof-of-concept system that integrates a large language model like ChatGPT to provide assistance to individuals with visual impairments in their daily lives through the utilisation of image captioning techniques. Our proposed model leverages the image captioning technique to describe the user’s environment. It is a fusion of concepts from Deep Learning and the Internet of Things, enabling it to provide more informative and enriched image captions. In this process, ChatGPT is stimulated to generate increasingly detailed and informative descriptions of images, allowing users to gain a deeper understanding of their surroundings. Our findings show that the proposed system generates captions that are contextually relevant to the visual content. These captions can assist individuals in various day-today activities, contributing to an improved quality of life.
AB - Vision is a vital sense that plays a pivotal role in our understanding of the world. The majority of our external information is acquired through our visual system, which significantly impacts various aspects of our lives, including mobility, cognitive abilities, access to information, and how we interact with both our surroundings and other individuals. Hence, individuals who need assisted living due to visual challenges are left behind and rely on human-driven image captioning services to make sense of their surroundings. In response to this challenge, we have developed a proof-of-concept system that integrates a large language model like ChatGPT to provide assistance to individuals with visual impairments in their daily lives through the utilisation of image captioning techniques. Our proposed model leverages the image captioning technique to describe the user’s environment. It is a fusion of concepts from Deep Learning and the Internet of Things, enabling it to provide more informative and enriched image captions. In this process, ChatGPT is stimulated to generate increasingly detailed and informative descriptions of images, allowing users to gain a deeper understanding of their surroundings. Our findings show that the proposed system generates captions that are contextually relevant to the visual content. These captions can assist individuals in various day-today activities, contributing to an improved quality of life.
KW - Assisted Living
KW - ChatGPT
KW - Image Captioning
KW - Internet of Things
KW - NLP
UR - http://www.scopus.com/inward/record.url?scp=85189303591&partnerID=8YFLogxK
U2 - 10.1109/DeSE60595.2023.10468995
DO - 10.1109/DeSE60595.2023.10468995
M3 - Conference contribution with ISSN or ISBN
SN - 9798350381351
T3 - 16th International Conference on Developments in eSystems Engineering (DeSE)
SP - 150
EP - 155
BT - DeSE 2023 - Proceedings
A2 - Obe, Dhiya Al-Jumeily
A2 - Assi, Sulaf
A2 - Jayabalan, Manoj
A2 - Hind, Jade
A2 - Hussain, Abir
A2 - Tawfik, Hissam
A2 - Rowe, Neil
A2 - Mustafina, Jamila
PB - IEEE
ER -