Xent Games

Xent games form the basis of the Xent benchmark. Xent games are text-based games written in a custom language. The game code defines the context, rules, and scoring of a game. A judge LLM generates the context, enforces the rules, and assigns rewards all in accordance with the game code.

Example Game

Lets look at an example:

This code defines a single-player game. A human description of this game might be "Given a story, provide a string of at most 10 tokens. Your string cannot use any words from the story. The judge will look at your string and then reward you based on how probable the story is to follow your string. So your string should help the judge predict and generate the story."

Here is a breakdown of each line of game code.

assign(s=story()) - Store a generated story in the "register" s. story() is a command which tells the judge to generate a short string as part of the game context.

elicit(x, 10) - Elicit a "move" 10 tokens or less from the player.

ensure(is_true("no shared words", x, s)) - Ensure that the judge assesses that "no shared words" is true for x and s. If the condition is not met, game execution jumps back to the previous line, eliciting a new move from the player.

reward(xed(s | x)) - Assign a reward to the player. xed(s | x) is syntactic sugar for xent(s) - xent(s | x), where xent means "cross-entropy loss of the judge" and s | x means "s prefixed with x". So xed(s | x) means something like "how much more likely is s when it is prefixed with x"

Structure of Xent Game Language

Now that you have seen an example game, let's discuss the basic structure of the Xent game language (XGL).

Each line of code in XGL contains one instruction. There are seven possible instructions:

Assign - store a value in a register
Elicit - ask a player for their next move
Ensure - enforce rules
Reward - give scores to players
Reveal - explicitly reveal data to players
Beacon - place a marker in the code
Replay - jump to a beacon

The variables in XGL are called "registers". Registers only hold strings. The set of registers is predefined. s, s1, s2, and s3 are the s registers. There is also t, x, y, a, b, c, and p, each with their own numbered registers. Registers can have different properties, but for now it's fine to view them as all just string variables.

XGL also contains a few different special functions. xed, as you saw above, is used to interact with the judge. xent, dex, and nex are other functions which can call the judge to score results. is_true and is_false are how the judge can be utilized for enforcing game rules. story is how the judge generates the game environment.

The example above showed a one-player game. But XGL supports multi-player games as well. The set of player names is fixed to black, white, alice, bob, and carol.

Using what you now know, let's look at a more complicated example.

Complicated Example Game

This game introduces a few features you haven't seen yet. It is multiplayer, with both black and white making moves.

Black and white are a special pair of zero-sum players; any reward to one is taken from the other. So if black is rewarded 10, white is rewarded -10.

This game is quite similar to the first example, with one major change: in the first example, the player was not allowed to use words from the story in their response. In this example, the list of restricted words is selected by an opponent.

Lets look at this this line by line again:

assign(s=story()) - Store a judge-generated string into the register s.

elicit(white, t, 20) - Elicit a string of at most 20 tokens from the player white and store that string in t.

elicit(black, t1, 10) - Elicit a string of at most 10 tokens from the player black and store that string in t1.

ensure(is_true("no shared words", t, t1)) - Have the judge check there are no shared words between t and t1.

reward(black, xed(s|t1)) - Assign a reward to the player black. The reward is xed(s|t1), which is explained above.

Want to know more?

The goal here is to give you an idea of what a xent game is. But this is far from a complete description. Some examples of Xent game capabilities that aren't discussed are:

Imperfect information games. The second example could be modified to hide the intercepted words from black.
Iterated game play. Xent games can be iterated internally (via beacon and replay) as well as externally, allowing players to utilize knowledge gleaned from earlier steps.
Non-zero-sum games. Strategies change when one's gain is not always their opponents' loss.

Additionally, there are ideas here that haven't been fully defended or justified. For instance:

What is the point of using cross-entropy values for scoring?
Can a judge LLM really evaluate the conditions presented above?
Are Xent games really so general? Can chess be formulated as a Xent game? (Yes)
What does gameplay look like from the perspective of an agent? How is information presented to agent players and how are moves elicited from them?