{ "id": "2505.16928", "version": "v2", "published": "2025-05-22T17:20:38.000Z", "updated": "2025-10-01T17:51:44.000Z", "title": "Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning", "authors": [ "Bosung Kim", "Prithviraj Ammanabrolu" ], "categories": [ "cs.AI", "cs.LG", "cs.RO" ], "abstract": "We introduce $\\infty$-THOR, a new framework for long-horizon embodied tasks that advances long-context understanding in embodied AI. $\\infty$-THOR provides: (1) a generation framework for synthesizing scalable, reproducible, and unlimited long-horizon trajectories; (2) a novel embodied QA task, Needle(s) in the Embodied Haystack, where multiple scattered clues across extended trajectories test agents' long-context reasoning ability; and (3) a long-horizon dataset and benchmark suite featuring complex tasks that span hundreds of environment steps, each paired with ground-truth action sequences. To enable this capability, we explore architectural adaptations, including interleaved Goal-State-Action modeling, context extension techniques, and Context Parallelism, to equip LLM-based agents for extreme long-context reasoning and interaction. Experimental results and analyses highlight the challenges posed by our benchmark and provide insights into training strategies and model behaviors under long-horizon conditions. Our work provides a foundation for the next generation of embodied AI systems capable of robust, long-term reasoning and planning.", "revisions": [ { "version": "v2", "updated": "2025-10-01T17:51:44.000Z" } ], "analyses": { "keywords": [ "long context reasoning", "embodied haystack", "training considerations", "environment", "long-horizon" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }