Track 2 Logo
This track challenges agents to complete a full Pokémon role-playing game (Pokémon Emerald) as quickly and efficiently as possible, navigating a massive, partially observable world with hundreds of NPCs and thousands of possible actions.

Long-horizon planning, efficient exploration, and strategic resource management are critical to succeeding in this track. Agents must learn to balance immediate objectives with long-term strategic goals, making decisions that span thousands of timesteps while adapting to the unpredictable nature of RPG gameplay.

The speedrunning challenge pushes AI systems to their limits in sequential decision-making, requiring sophisticated planning algorithms and efficient resource management to achieve optimal completion times in complex, open-world environments.

⏰ Competition Ends In

-- Days
-- Hours
-- Minutes
-- Seconds

Track 2 Submissions end November 15th, 2025

Starter Kits

Our starter kit provides a real-time agent loop with modular components for perception (game frame recognition), planning & memory (long term vs. short term goals, knowledge storage), and control (gameboy emulator action execution).


What's Included

  • Agent Scaffolding: Modular framework for building Pokémon Emerald speedrunning agents
  • Pokémon Emerald Wrapper: Custom emulator API for real-time game interaction
  • Baseline Implementation: Reference agent with VLM setup and basic planning
  • Evaluation Tools: Automated testing and performance measurement

Compute Credits

Application closed and credits awarded. All compute credits have been distributed to approved teams.

Submission Guidelines

Prize Eligibility Notice

Exact clones of organizer-hosted baselines are not eligible for prizes. Submissions must demonstrate novel approaches, meaningful modifications, or original implementations. Simple repackaging or minimal changes to existing baseline code will be disqualified from prize consideration.

How to Submit for Track 2

Submissions for this track focus on achieving maximum game completion under time constraints. Your agent must interact exclusively through our custom Pokémon Emerald emulator API. Use any method, as long as the final action comes from a neural network.

Important: All submissions will undergo anti-cheat verification to ensure fair competition. This includes validation of agent behavior, action logs, and verification that submissions follow the competition rules.

Submission Requirements

  • Code Archive: Your agent implementation as a ZIP or TAR.GZ file including all dependencies and README
  • Action & State Logs (Anti-Cheat): The submission.log and detailed logs generated by the starter kit during your agent's run. These logs validate that your agent followed competition rules and provide action/state information for evaluation.
  • Methodology Description: A brief document (1-2 paragraphs via the google form) describing your approach and detailing your scaffolding components across the five dimensions (State Information, Tools, Memory, Feedback, Fine-tuning). This is required for calculating your Adjusted Performance score.
  • Video Evidence: YouTube link to a screen recording showing the complete speedrun

Code Modification Policy: You are encouraged to modify, extend, or completely rewrite the starter kit code to implement your approach. The only requirement is that your submission includes the valid logs (including submission.log) generated by the starter kit's logging system, which verifies your agent interacted with the game through the official API and followed competition rules.

Final Ranking Criteria

Final rankings are determined by raw performance metrics only (number of actions and time). Based on community feedback, we have simplified the main ranking to focus purely on objective performance measures.

Primary Ranking Components
  • Milestone Completion: Percentage of game milestones accomplished (e.g., gym badges, story progression)
  • Completion Efficiency: Time and action count to achieve milestones
  • Reproducibility: Clear documentation and verifiable results

Novel Methods Welcome: While we provide a starter kit with an LLM-scaffolded approach, we encourage submissions using a wide variety of methods including tool-augmented systems, reinforcement learning, purely text-based reasoning, hybrid architectures, and other innovative techniques. The competition is designed to be open to diverse methodologies—whether you're building a complex multi-agent system or a streamlined end-to-end model, your approach is welcome!

Methodology Documentation for Judges' Choice Awards

While scaffolding complexity does not affect the main rankings, teams must still document their methodology across five dimensions for consideration of separate Judges' Choice and innovation awards:

  • State Information (S): What information your agent receives (raw pixels vs. parsed game state vs. privileged information)
  • Tools (T): External tools available during gameplay (web search, calculators, planning utilities, etc.)
  • Memory (M): Memory mechanisms beyond immediate context (vector databases, knowledge graphs, external storage)
  • Feedback (F): Human or automated feedback during runs (human-in-the-loop, checkpointing guidance, etc.)
  • Fine-tuning (Φ): Specialized training on Pokémon data (domain-specific datasets, game-specific RL, etc.)

Judges' Choice Awards: Separate awards will recognize innovative approaches, including those with minimal scaffolding (limit amount of prompts to the LLM), creative tool use, and novel architectural designs. These awards encourage diverse methodologies while keeping the main competition ranking simple and objective.

Teams must document their scaffolding components in detail during submission for eligibility for Judges' Choice awards. The organizing committee will review submissions for these special recognitions.

Timeline

June 11th, 2025

Competition Website Launch

Official competition website goes live with preliminary documentation.

June 25th, 2025

Formal Competition Announcement

Full rules and track timeline announced. Starter code with a baseline RPG agent (scaffolding and VLM setup) and emulator API available for beta testers.

July 7th, 2025

Competition Begins

Track 2 Competition Begins. Submit runs of your Pokémon Emerald agent to the leaderboard.

November 15th, 2025

Results Announcement

Final submission deadline.

December 2025

NeurIPS 2025 Presentation

Winners announced at NeurIPS 2025.

Prizes

"100 GCP" refers to "$100 worth of GCP credits."

Speedrun Rankings

Top performing agents in the RPG speedrunning challenge will be awarded $4,500 and 1000 GCP total:

  • 1st Place: $1,500 + 700 GCP
  • 2nd Place: $1,000 + 300 GCP
  • 3rd-4th Place: $500

Judge's Choice Award

Senior organizers will award at most four projects with $400 or 500 GCP to help continue their work after the competition ends.

This project does not necessarily have to place highly in the speedrun rankings but should propose a novel approach or demonstrate interesting capabilities in long-horizon planning or RPG navigation.