TY - JOUR
T1 - A Review of Advances in Large Language and Vision Models for Robotic Manipulation
T2 - Techniques, Integrations, and Challenges
AU - Hussain, Sajjad
AU - Biswas, Shwetangshu
AU - Dutta, Amandip
AU - Saad, Md
AU - Baimagambetov, Almas
AU - Saeed, Khizer
AU - Polatidis, Nikolaos
PY - 2025/6/25
Y1 - 2025/6/25
N2 - Recent advancements in transformer-based systems, including Large Language Models and Large Vision Models, have significantly transformed robotic manipulation by enabling enhanced task planning, real-time decision-making, and adaptive behaviour in complex environments. This review synthesises current research on integrating these models with robotic control systems, highlighting innovative strategies that merge linguistic and visual processing to improve precision and efficiency. It also critically examines challenges such as scalability, robustness, interpretability, and real-world applicability while identifying research gaps and future directions. This paper provides a concise yet comprehensive overview of the transformative impact of transformer-based systems on robotics, offering valuable insights for developing more sophisticated and versatile robotic systems.
AB - Recent advancements in transformer-based systems, including Large Language Models and Large Vision Models, have significantly transformed robotic manipulation by enabling enhanced task planning, real-time decision-making, and adaptive behaviour in complex environments. This review synthesises current research on integrating these models with robotic control systems, highlighting innovative strategies that merge linguistic and visual processing to improve precision and efficiency. It also critically examines challenges such as scalability, robustness, interpretability, and real-world applicability while identifying research gaps and future directions. This paper provides a concise yet comprehensive overview of the transformative impact of transformer-based systems on robotics, offering valuable insights for developing more sophisticated and versatile robotic systems.
KW - Large language models (LLMs)
KW - Large vision models (LVMs)
KW - Robotic manipulation
KW - Deep learning
KW - Reinforcement learning
KW - Task execution
U2 - 10.1007/s42979-025-04119-6
DO - 10.1007/s42979-025-04119-6
M3 - Article
SN - 2661-8907
VL - 6
JO - SN Computer Science
JF - SN Computer Science
M1 - 588
ER -