EAI Challenge @ CVPR 2026

Embodied Agent Interface: Evaluating LLMs for Embodied Decision Making

VirtualHome Track

Part of Foundation Models Meet Embodied Agents (FMEA) Workshop @ CVPR 2026
GitHub Paper Submit Project Page

Challenge Overview

The Embodied Agent Interface (EAI) Challenge invites participants to develop and evaluate Large Language Models (LLMs) for embodied reasoning through our standardized evaluation protocol. This challenge is part of the FMEA Workshop at CVPR 2026.

Unlike typical evaluations that only report success rates, our framework provides fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. The challenge uses the VirtualHome simulator with annotations including Linear Temporal Logic (LTL) specifications and comprehensive error analysis.

What's New in 2026

  • Focused exclusively on VirtualHome environment for deeper, more targeted evaluation
  • Part of the FMEA Workshop alongside ENAcT and EmbodiedBench challenges
  • Updated evaluation suite with enhanced metrics

Evaluation Tasks

The challenge assesses four critical capabilities for embodied reasoning:

🎯

Goal Interpretation

Understanding and interpreting high-level task objectives in embodied environments

🪦

Subgoal Decomposition

Breaking down complex goals into manageable intermediate subgoals

📝

Action Sequencing

Planning and ordering executable actions to accomplish each subgoal

🌐

Transition Modeling

Understanding how actions change the world state in the environment

Data

All data and evaluation are conducted within the VirtualHome simulator. We provide datasets and starter code through our GitHub repository.

Split Description Status
Training Training data with ground-truth annotations Coming Soon
Validation Validation set for development and tuning Coming Soon
Test Held-out test set for final evaluation Coming Soon

Resources

Evaluation

Our evaluation framework goes beyond simple success rates. We employ fine-grained metrics across all four task dimensions to measure both the executability and correctness of agent outputs.

Evaluation Dimensions

  • Goal Interpretation: Accuracy of interpreting task goals against ground-truth LTL specifications
  • Subgoal Decomposition: Quality of intermediate subgoal generation
  • Action Sequencing: Executability and goal-completion of action plans in VirtualHome
  • Transition Modeling: Accuracy of predicted state changes after actions

Teams are ranked by a weighted overall score combining metrics across all four tasks. Detailed evaluation criteria and scoring rubrics will be released with the data.

Timeline

March 2026
Challenge announced & platform setup
April 2026
Training & validation data released
April – May 2026
Development phase with leaderboard open for submissions
June 2026
Test phase & final evaluation
June 2026 (CVPR)
Winners announced & presentations at FMEA Workshop

* Exact dates will be announced soon. Stay tuned!

Prizes

🥇
1st Place
TBD
🥈
2nd Place
TBD
🥉
3rd Place
TBD
💡
Most Innovative
TBD

Honorable mentions will be awarded for top performance in individual tasks: Goal Interpretation, Subgoal Decomposition, Action Sequencing, and Transition Modeling.

Submission

Submission Format

Participants should submit their model outputs following the format specified in our evaluation toolkit. Detailed submission instructions will be provided when the challenge officially opens.

Rules

  • Use of external resources (pretrained models, additional data) is allowed but must be disclosed
  • Manual labeling or annotation of test data is strictly prohibited
  • Top-performing teams may be asked to share code for verification
  • Each team may submit up to 5 submissions during the evaluation phase
  • No restriction on team size, but each team must submit under a single team name
  • By participating, teams agree to have their results published on the leaderboard

Submission Portal

The submission portal will be available soon.

Leaderboard

The leaderboard will be updated once the challenge officially opens and submissions begin.

Rank Team / Method Overall Goal Interp. Subgoal Decomp. Action Seq. Transition
Coming soon — challenge has not yet started.

Past Challenge

EAI Challenge @ NeurIPS 2025

Our previous challenge was held at NeurIPS 2025 (December 7, 2025, San Diego). It evaluated both VirtualHome and BEHAVIOR environments. See the results and details here.

Winners:

  • 1st Place: AxisTilted2
  • 2nd Place: SingaX
  • 3rd Place: CtrlAct
  • Most Innovative: nju-lamda12

Contact

For questions about the challenge, please reach out to us: