EAI Challenge @ CVPR 2026 — Embodied Agent Interface

Challenge Overview

The Embodied Agent Interface (EAI) Challenge invites participants to develop and evaluate Large Language Models (LLMs) for embodied reasoning through our standardized evaluation protocol. This challenge is part of the FMEA Workshop at CVPR 2026.

Unlike typical evaluations that only report success rates, our framework provides fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. The challenge uses the VirtualHome simulator with annotations including Linear Temporal Logic (LTL) specifications and comprehensive error analysis.

What's New in 2026

Focused exclusively on VirtualHome environment for deeper, more targeted evaluation
Part of the FMEA Workshop alongside ENAcT and EmbodiedBench challenges
Updated evaluation suite with enhanced metrics

Evaluation Tasks

The challenge assesses four critical capabilities for embodied reasoning:

🎯

Goal Interpretation

Understanding and interpreting high-level task objectives in embodied environments

🪦

Subgoal Decomposition

Breaking down complex goals into manageable intermediate subgoals

📝

Action Sequencing

Planning and ordering executable actions to accomplish each subgoal

🌐

Transition Modeling

Understanding how actions change the world state in the environment

Data

All data and evaluation are conducted within the VirtualHome simulator. We provide datasets and starter code through our GitHub repository.

Split	Description	Status
Training	Training data with ground-truth annotations	Coming Soon
Validation	Validation set for development and tuning	Coming Soon
Test	Held-out test set for final evaluation	Coming Soon

Resources

GitHub Repository — Baselines, data loaders, and starter code
Hugging Face — Datasets and model checkpoints
PyPI Package — pip install embodied-agent-interface
Docker — Pre-configured environment

Evaluation

Our evaluation framework goes beyond simple success rates. We employ fine-grained metrics across all four task dimensions to measure both the executability and correctness of agent outputs.

Evaluation Dimensions

Goal Interpretation: Accuracy of interpreting task goals against ground-truth LTL specifications
Subgoal Decomposition: Quality of intermediate subgoal generation
Action Sequencing: Executability and goal-completion of action plans in VirtualHome
Transition Modeling: Accuracy of predicted state changes after actions

Teams are ranked by a weighted overall score combining metrics across all four tasks. Detailed evaluation criteria and scoring rubrics will be released with the data.

Timeline

March 2026

Challenge announced & platform setup

April 2026

Training & validation data released

April – May 2026

Development phase with leaderboard open for submissions

June 2026

Test phase & final evaluation

June 2026 (CVPR)

Winners announced & presentations at FMEA Workshop

* Exact dates will be announced soon. Stay tuned!

Prizes

🥇

1st Place

TBD

🥈

2nd Place

TBD

🥉

3rd Place

TBD

💡

Most Innovative

TBD

Honorable mentions will be awarded for top performance in individual tasks: Goal Interpretation, Subgoal Decomposition, Action Sequencing, and Transition Modeling.

Submission

Submission Format

Participants should submit their model outputs following the format specified in our evaluation toolkit. Detailed submission instructions will be provided when the challenge officially opens.

Rules

Use of external resources (pretrained models, additional data) is allowed but must be disclosed
Manual labeling or annotation of test data is strictly prohibited
Top-performing teams may be asked to share code for verification
Each team may submit up to 5 submissions during the evaluation phase
No restriction on team size, but each team must submit under a single team name
By participating, teams agree to have their results published on the leaderboard

Submission Portal

The submission portal will be available soon.

Leaderboard

The leaderboard will be updated once the challenge officially opens and submissions begin.

Rank	Team / Method	Overall	Goal Interp.	Subgoal Decomp.	Action Seq.	Transition
Coming soon — challenge has not yet started.

Past Challenge

EAI Challenge @ NeurIPS 2025

Our previous challenge was held at NeurIPS 2025 (December 7, 2025, San Diego). It evaluated both VirtualHome and BEHAVIOR environments. See the results and details here.

Winners:

1st Place: AxisTilted2
2nd Place: SingaX
3rd Place: CtrlAct
Most Innovative: nju-lamda12

Contact

For questions about the challenge, please reach out to us:

Email: Kangrui.Wang@northwestern.edu
GitHub: Open an issue