Pokémon Showdown is an open-source simulator that transforms Pokémon's turn-based battles into a competitive strategy game enjoyed by thousands of daily players. Competitive Pokémon battles are two-player stochastic games with imperfect information. Players build teams of Pokémon and navigate complex battles by mastering nuanced gameplay mechanics and making decisions under uncertainty. Key details about the opponent’s team remain hidden until they impact the battle, prompting players to infer missing information and anticipate future moves. Top human players excel by accurately predicting their opponent's strategies and leveraging their own team's strengths. The randomness, partial observability, and vast team diversity in Pokémon battles challenge AI's ability to plan and generalize.
Though Pokémon Showdown battle bots have existed for many years, advances in language models, large-scale reinforcement learning datasets, and accessible open-source tools have sparked renewed interest within the machine learning research community. Recent methods have achieved human-level gameplay in popular singles rulesets, prompting an exciting question: How much further can we push the capabilities of Competitive Pokémon AI? Join Track 1 of the PokéAgent Challenge and help us find out!
To avoid disrupting human players, PokéAgent participants will compete on a custom AI-focused Showdown server hosted by the competition. Participants will battle against organizer-hosted baselines and each other on ranked ladders to qualify for tournament brackets.
Track 1 currently has a prize pool of over $6,000 plus $3,000 worth of Google Cloud Platform (GCP) credits. We are still accepting new sponsors, so these totals are subject to change.
The competition is divided into three phases:
The PokéAgent Showdown server supports the following rulesets ("formats") where ML baselines and datasets are readily available: Gen1OU, Gen2OU, Gen3OU, Gen4OU, Gen9OU, and Gen9 VGC Regulation I.
See this FAQ for more information on format rules.
Organizer baselines will keep the ladders active by serving opponents at various skill levels for testing and development.
Introduction to Pokémon, lightning-round talks from Pokémon AI projects, and organizer office hours. The top-ranked participants on the Gen1OU and Gen9OU ladders receive $1,000 worth of GCP credits to help fund their research.
Practice tournaments replicating the final bracket format. Announced on Discord, filled first-come-first-serve. Starter kits will release updates for joining tournaments.
Teams earn a spot in the final tournament bracket by climbing the ranked ladder.
Showdown server exits free-play mode. Only registered usernames can participate moving forward. Register here.
Begins: Monday, October 13th at 12:01 AM
Ends: Sunday, October 19th at 11:00 PM
Begins: Monday, October 20th at 12:01 AM
Ends: Sunday, October 26th at 11:00 PM
Qualifying teams compete in bracket tournaments for cash prizes and GCP credits.
The Gen1OU and Gen9OU brackets each award $3,000 and 1000 GCP (separately):
Senior organizers will award two projects with $100 and 400 GCP to help continue their work after the competition ends.
These projects do not necessarily have to qualify for the tournament stage but should propose a novel method or highlight an interesting direction for future research.
Competitive Pokémon is extremely complex and creates an interesting game-playing benchmark all by itself. However, we'd add that Pokémon's popularity and Showdown's replay dataset are key features that give this domain a unique place in current AI research:
Taken together, Pokémon creates a fun opportunity for areas like LLM-Agents and RL to compete and collaborate on a level playing field. In that spirit, the PokéAgent Challenge was organized by the teams behind PokéChamp and Metamon, which are recent papers that demonstrate strong human-level play in singles formats. Both projects are open-source and have recently been updated to create a more helpful starting point for this competition. We have also been joined by VGC-Bench to extend support to VGC (doubles)!
These projects have their own repositories and publications where you can find more detailed information.
The "Resources" section below highlights datasets and baselines from these efforts that may be more broadly useful in the development of new methods. Participants looking for more of a blank slate to get started are encouraged to check out poke-env
— the Python interface to Showdown used by most recent academic work.
Recommended for students: Apply to receive GCP credits for cloud compute and Gemini API access. Roughly $100+ per team (subject to application approval).
Apply for Compute CreditsApplications will be reviewed and distributed on a rolling basis until funds are depleted.
Interested in ML but new to Pokémon? Read an intro guide here
Evaluation for Track 1 will take place on the PokéAgent Showdown server. Test your method by searching for battles on the ladder!
1. Create Account: Create a Showdown username and password by clicking the gear icon in the top right corner. Bot usernames should begin with "PAC" (PokéAgent Challenge).
2. Deploy: Starter kits have specific instructions and quick-setups to deploy agents on the server. For more general cases, the poke-env server config details would be:
PokeAgentServerConfiguration = ServerConfiguration(
"wss://pokeagentshowdown.com/showdown/websocket",
"https://play.pokemonshowdown.com/action.php?",
)
3. Battle: Watch live battles and climb to the top of the ranked ladder! You'll be matched against other participants and a set of organizer-hosted baselines that keep the ladder active.
4. Team Registration (Required): On Monday, October 13th, the practice ladder will be taken offline and only registered usernames may play in the qualifiers. Each team may register exactly one Showdown username for the remainder of the competition.
Showdown makes battle replays accessible via a public API. The competition organizers maintain curated datasets for convenience and (a little) extra privacy, and host them on Hugging Face to spare Showdown download requests from this competition. We'd encourage you to use them unless you have a good reason not to. Collectively, they cover the entire range of supported rulesets for the competition:
Formats | Time Period | Battles | |
---|---|---|---|
pokechamp |
Many (39+) | 2024-2025 | 2M |
metamon-raw-replays |
All PokéAgent Except VGC | 2014-2025 | 1.8M |
vgc-battle-logs |
Gen 9 VGC (OTS) | 2023-2025 | 330k |
Showdown replays are saved from the point of view of a spectator rather than the point of view of a player, which can make it difficult to use them directly in an imperfect information game like Pokémon. We need to reconstruct (and often predict) the perspective of each player to create a more typical offline RL or imitation learning dataset.
metamon
converts its replay dataset into a flexible format that allows customization of observations, actions, and rewards.
More than 3.5M full battle trajectories are stored on Hugging Face at
metamon-parsed-replays
and can be accessed through the metamon repo. Utilities in pokechamp
can recreate its LLM-Agent prompts and decisions from replays in a similar fashion.
vgc-bench
covers the Gen 9 VGC ruleset and does not have to resort to predicting unobserved information
(because VGC replays can have "Open Team Sheets" where ground-truth team info is public).
teams
:
All of the Showdown rulesets in the competition require players to pick their own teams. This dataset provides sets of teams gathered from forums,
predicted from replays, and/or procedurally generated from Showdown trends. This creates a starting point for anyone less familiar with Competitive Pokémon,
and establishes diverse team sets for self-play.
usage-stats
:
A convenient way to access the Showdown team usage stats
for the formats covered by the competition and the timeframe covered by the replay datasets.
This dataset
also includes a log of all the partially revealed teams in replays. Team prediction is an interesting subproblem that your method may want to address!
Organizers will inflate the player pool on the PokéAgent ladder with a rotating cast of existing baselines covering a wide range of skill levels and team choices. At launch, these will include:
metamon
's basic evaluation opponents.metamon
and pokechamp
agents playing with competitive team choices.
These baselines have already demonstrated performance comparable to strong human players. If all goes well, they will be at the bottom of the leaderboard by the end of the competition!metamon
and pokechamp
agents aimed at increasing variety and forcing participants to battle teams that resemble the real Showdown ladder.
For example, varied LLM backends with several prompting and search strategies. Hundreds of checkpoints from metamon policies
at various stages of training, sampling from thousands of unique teams.Launch baselines are already open-source, so you are free to skip the ladder queue by hosting them on your local server with help from their home repo. Any additional (stronger) agents in development by the organizers will be added to the ladder rotation as the competition progresses.
We hope the ladder will also be full of participants trying new ideas; your agents' competition will always be improving!
Competition staff will be active on the community Discord and are committed to answering questions (and fixing any issues that may arise) related to the starter datasets, baselines, competition logistics, and Showdown/Pokémon more broadly. However, due to limited bandwidth, we caution that the technical details involved in improving upon provided methods may be deemed out-of-scope (e.g., RL training details beyond the provided documentation). This is mainly because, given the datasets and baselines, you would have many other viable options that are maintained by larger teams and better suited to a broad audience. Still, please feel free to reach out, and we will help in any way we can.