← Back to Computer Vision
cs.CV

Teaching household robots to find relevant details in cluttered homes

ZhiYuan Feng, Yu Deng, Ruichuan An, Zhenhua Liu, Qixiu Li, Keming Wu, Zhiying Du, Weijie Wang, Haoxiao Wang, Shuang Chen, Sicheng Xu, Yaobo Liang, Jiaolong Yang, Baining Guo

May 18, 2026

Household robots operating in real homes must extract task-relevant information from cluttered, irrelevant scene details—a harder problem than responding to clean task specifications. TaskGround addresses this by first grounding complete household scenes into compact task-relevant slices, then inferring executable task structure, then compiling actions. The method is training-free and works with any language model. The authors introduce FullHome, a 400-task evaluation suite with human validation spanning diverse home environments and constraint types. TaskGround significantly improves task success rates across proprietary and open-weight models, making a 9B-parameter open model competitive with GPT-5 on the benchmark while cutting input tokens by up to 18×—critical for privacy-constrained local deployment.
Published as TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning arXiv:2605.18109
Read the original paper →