Xent Leaderboard
The Xent benchmark uses games to evaluate an LLM agent's general capabilities. Xent games have LLM agents play text-based games in a fully LLM-defined environment. There are three games in the current benchmark set: “Condense”, “Contrast”, and “Synthesize”. LLM agents play these games in a series of “maps”, where each game map has different environmental data.
19
Summary
Player Leaderboard