{ "id": "2209.00465", "version": "v1", "published": "2022-08-29T16:37:18.000Z", "updated": "2022-08-29T16:37:18.000Z", "title": "On Grounded Planning for Embodied Tasks with Language Models", "authors": [ "Bill Yuchen Lin", "Chengsong Huang", "Qian Liu", "Wenda Gu", "Sam Sommerer", "Xiang Ren" ], "categories": [ "cs.AI", "cs.CL", "cs.LG", "cs.RO" ], "abstract": "Language models (LMs) are shown to have commonsense knowledge of the physical world, which is fundamental for completing tasks in everyday situations. However, it is still an open question whether LMs have the ability to generate grounded, executable plans for embodied tasks. It is very challenging because LMs do not have an \"eye\" or \"hand\" to perceive the realistic environment. In this work, we show the first study on this important research question. We first present a novel problem formulation named G-PlanET, which takes as input a high-level goal and a table of objects in a specific environment. The expected output is a plan consisting of step-by-step instructions for agents to execute. To enable the study of this problem, we establish an evaluation protocol and devise a dedicated metric for assessing the quality of plans. In our extensive experiments, we show that adding flattened tables for encoding environments and using an iterative decoding strategy can both improve the LMs' ability for grounded planning. Our analysis of the results also leads to interesting non-trivial findings.", "revisions": [ { "version": "v1", "updated": "2022-08-29T16:37:18.000Z" } ], "analyses": { "keywords": [ "language models", "embodied tasks", "grounded planning", "novel problem formulation named g-planet", "environment" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }