Track 1 Logo

Pokémon Showdown is an open-source simulator that transforms Pokémon's turn-based battles into a competitive strategy game enjoyed by thousands of daily players. Competitive Pokémon battles are two-player stochastic games with imperfect information. Players build teams of Pokémon and navigate complex battles by mastering nuanced gameplay mechanics and making decisions under uncertainty. Key details about the opponent’s team remain hidden until they impact the battle, prompting players to infer missing information and anticipate future moves. Top human players excel by accurately predicting their opponent's strategies and leveraging their own team's strengths. The randomness, partial observability, and vast team diversity in Pokémon battles challenge AI's ability to plan and generalize.


Though Pokémon Showdown battle bots have existed for many years, advances in language models, large-scale reinforcement learning datasets, and accessible open-source tools have sparked renewed interest within the machine learning research community. Recent methods have achieved human-level gameplay in popular singles rulesets, prompting an exciting question: How much further can we push the capabilities of Competitive Pokémon AI? Join Track 1 of the PokéAgent Challenge and help us find out!


⏰ Loading...

-- Days
-- Hours
-- Minutes
-- Seconds

Loading phase information...

Free-play
Qualification
Tournament

PokéAgent Showdown

To avoid disrupting human players, PokéAgent participants will compete on an AI-focused Showdown server hosted by the competition. Participants will battle against organizer-hosted baselines (and each other!) on a ranked ladder to qualify for a tournament held at the end of the competition window. The PokéAgent Showdown server will restrict matchmaking to the following rulesets ("formats") where ML baselines and datasets are readily available:

Format Rules
Gen 1 OverUsed (OU) Current official rules.
Gen 2 OU Current official rules.
Gen 3 OU Current official rules.
Gen 4 OU Current official rules.
Gen 9 OU Current rules, plus the AI-friendly exceptions that:
• Zoroark is banned
• Revival Blessing is banned
Gen 9 VGC Official Regulation I rules, plus the AI-friendly exceptions that:
• Zoroark is banned
• Dondozo is banned
(Note that all battles will be played with Showdown's standard time controls enabled.)

The competition will be divided into three stages:

  • 1. Research Stage (July 11th - September 28th): The first stage is a "free play" period. The PokéAgent server will be open to all participants, with player pools supplemented by organizer baselines across a wide range of skill levels. Get started, develop new ideas, and see how your method compares to the competition!
  • 2. Qualifying Stage (September 29th - October 12th): On September 29th, the competition formally begins. All ratings will be reset, and teams will climb the leaderboards by competing on the ranked ladder. At the end of this stage, the top-ranked usernames (excluding organizer baselines) will qualify for the next stage.
  • 3. Tournament Stage (October 15th - October 31st): Qualifying participants will compete in a bracket tournament featuring best-of-k-battle matches.

For those familiar, this structure follows the general format of a Showdown "ladder tournament"—but for bots!

Because this is the first time a competition like this has been run, some specifics are flexible and will be finalized during the Research Stage in conversation with the community (via Discord):

Showdown Rulesets: All of the formats listed above will be available throughout the Research Stage. However, the formal Tournament Stage will limit competition to a subset of this list. Currently, Gen 1 OU and Gen 9 OU are guaranteed to have brackets, with additional options to be determined based on the number of participants, ladder activity, community feedback, and available prizes.

Bracket Size and Match Length: Pokémon has high variance. Human tournaments often use a best two-out-of-three battle structure. We would prefer to go (significantly) higher, but will work with qualifying participants to determine whether this creates a financial burden and whether credits provided by our sponsors can help. The total bracket size (e.g., 8, 16, 32 teams) will be determined based on the number of active entrants and announced well in advance of the Qualifying Stage.

Starting Resources

Competitive Pokémon is extremely complex and creates an interesting game-playing benchmark all by itself. However, we'd add that Pokémon's popularity and Showdown's replay dataset are key features that give this domain a unique place in current AI research:

  • Pokémon's widespread internet presence equips LLMs with extensive knowledge of its mechanics and strategies.
  • Pokémon Showdown makes millions of competitive battles publicly available, enabling the creation of large, high-quality datasets.

Taken together, Pokémon creates a fun opportunity for areas like LLM-Agents and RL to compete and collaborate on a level playing field. In that spirit, the PokéAgent Challenge was organized by the teams behind PokéChamp and Metamon, which are recent papers that demonstrate strong human-level play in singles formats. Both projects are open-source and have recently been updated to create a more helpful starting point for this competition. We have also been joined by VGC-Bench to extend support to VGC (doubles)!

These projects have their own repositories and publications where you can find more detailed information. The "Resources" section below highlights datasets and baselines from these efforts that may be more broadly useful in the development of new methods. Participants looking for more of a blank slate to get started with are encouraged to check out poke-env --- the python interface to Showdown used by most recent academic work.

Compute Credits Available

Recommended for students: Apply to receive GCP credits for cloud compute and Gemini API access. Roughly $100+ per team (pending submissions).

Apply for Compute Credits

Applications will be reviewed and distributed on a rolling basis until funds are depleted.

Interested in ML but new to Pokémon? Read an intro guide here

Submission Guidelines

How to Submit for Track 1

Evaluation for Track 1 will take place on the PokéAgent Showdown server. Test your method by searching for battles on the ladder!

1. Register: Create a Showdown username and password by clicking the gear icon in the top right corner. Bot usernames should begin with "PAC" (PokéAgent Challenge).

2. Deploy: Starter kits have specific instructions and quick-setups to deploy agents on the server. For more general cases, the poke-env server config details would be:

PokeAgentServerConfiguration = ServerConfiguration(
    "wss://pokeagentshowdown.com/showdown/websocket",
    "https://play.pokemonshowdown.com/action.php?",
)

3. Battle!: Watch live battles and climb to the top of the ranked ladder! You'll be matched against other participants and a set of organizer-hosted baselines that keep the ladder active.

4. Verify: In order to verify entries and have your entry count towards the official ladder. Please declare your team members, provide a contact email, and designate one Showdown username for your team. By registering, you will have to agree to provide the organizers with source code and written technical details of your method upon request.

Track 1 Timeline

July 11th, 2025

Practice Window Begins

The PokéAgent Showdown server launches for free-play AI battles.

September 29th, 2025

Qualification Window Begins

Ladder ratings are reset and participants battle for leaderboard positions.

October 12th, 2025

Qualification Window Ends

Ladder ratings are finalized and top-ranked participants qualify for the tournament.

October 15th, 2025

Bracket Tournament Begins

Qualifying participants battle in a best-of-k single-elimination tournament.

October 31st, 2025

Bracket Tournament Ends

Tournament concludes and final results are announced.

Resources and Support

Datasets

Showdown Replay Logs

Showdown makes battle replays accessible via a public API. The competition organizers maintain curated datasets for convenience and (a little) extra privacy, and host them on Hugging Face to spare Showdown download requests from this competition. We'd encourage you to use them unless you have a good reason not to. Collectively, they cover the entire range of supported rulesets for the competition:

Formats Time Period Battles
pokechamp Many (39+) 2024-2025 2M
metamon-raw-replays All PokéAgent Except VGC 2014-2025 1.8M
vgc-battle-logs Gen 9 VGC (OTS) 2023-2025 330k

Replays as Agent Training Data

Showdown replays are saved from the point of view of a spectator rather than the point of view of a player, which can make it difficult to use them directly in an imperfect information game like Pokémon. We need to reconstruct (and often predict) the perspective of each player to create a more typical offline RL or imitation learning dataset.

metamon converts its replay dataset into a flexible format that allows customization of observations, actions, and rewards. More than 3.5M full battle trajectories are stored on Hugging Face at metamon-parsed-replays and can be accessed through the metamon repo. Utilities in pokechamp can recreate its LLM-Agent prompts and decisions from replays in a similar fashion. vgc-bench covers the Gen 9 VGC ruleset and does not have to resort to predicting unobserved information (because VGC replays can have "Open Team Sheets" where ground-truth team info is public).

Miscellaneous

teams: All of the Showdown rulesets in the competition require players to pick their own teams. This dataset provides sets of teams gathered from forums, predicted from replays, and/or procedurally generated from Showdown trends. This creates a starting point for anyone less familiar with Competitive Pokémon, and establishes diverse team sets for self-play.

usage-stats: A convenient way to access the Showdown team usage stats for the formats covered by the competition and the timeframe covered by the replay datasets. This dataset also includes a log of all the partially revealed teams in replays. Team prediction is an interesting subproblem that your method may want to address!

Baselines

Organizers will inflate the player pool on the PokéAgent ladder with a rotating cast of existing baselines covering a wide range of skill levels and team choices. At launch, these will include:

  1. Simple Heuristics to simulate the low-ELO ladder and let new methods get started. These are mainly sourced from metamon's basic evaluation opponents.
  2. The best metamon and pokechamp agents playing with competitive team choices. These baselines have already demonstrated performance comparable to strong human players. If all goes well, they will be at the bottom of the leaderboard by the end of the competition!
  3. Mixed metamon and pokechamp agents aimed at increasing variety and forcing participants to battle teams that resemble the real Showdown ladder. For example, varied LLM backends with several prompting and search strategies. Hundreds of checkpoints from metamon policies at various stages of training, sampling from thousands of unique teams.

Launch baselines are already open-source, so you are free to skip the ladder queue by hosting them on your local server with help from their home repo. Any additional (stronger) agents in development by the organizers will be added to the ladder rotation as the competition progresses.

We hope the ladder will also be full of participants trying new ideas; your agents' competition will always be improving!

Support

Competition staff will be active on the community Discord and are committed to answering questions (and fixing any issues that may arise) related to the starter datasets, baselines, competition logistics, and Showdown/Pokémon more broadly. However, due to limited bandwidth, we caution that the technical details involved in improving upon provided methods may be deemed out-of-scope (e.g., RL training details beyond the provided documentation). This is mainly because, given the datasets and baselines, you would have many other viable options that are maintaned by larger teams and better suited to a broad audience. Still, please feel free to reach out, and we will help in any way we can.