A Review of Advances in Large Language and Vision Models for Robotic Manipulation: Techniques, Integrations, and Challenges

Sajjad Hussain, Shwetangshu Biswas, Amandip Dutta, Md Saad, Almas Baimagambetov, Khizer Saeed, Nikolaos Polatidis

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advancements in transformer-based systems, including Large Language Models and Large Vision Models, have significantly transformed robotic manipulation by enabling enhanced task planning, real-time decision-making, and adaptive behaviour in complex environments. This review synthesises current research on integrating these models with robotic control systems, highlighting innovative strategies that merge linguistic and visual processing to improve precision and efficiency. It also critically examines challenges such as scalability, robustness, interpretability, and real-world applicability while identifying research gaps and future directions. This paper provides a concise yet comprehensive overview of the transformative impact of transformer-based systems on robotics, offering valuable insights for developing more sophisticated and versatile robotic systems.
Original languageEnglish
Article number588
JournalSN Computer Science
Volume6
DOIs
Publication statusPublished - 25 Jun 2025

Keywords

  • Large language models (LLMs)
  • Large vision models (LVMs)
  • Robotic manipulation
  • Deep learning
  • Reinforcement learning
  • Task execution

Fingerprint

Dive into the research topics of 'A Review of Advances in Large Language and Vision Models for Robotic Manipulation: Techniques, Integrations, and Challenges'. Together they form a unique fingerprint.

Cite this