Xent Benchmark: Theory

The Xent Benchmark is built on a theoretical foundation. This foundation provides the following key qualities:

Generality - Xent games can represent an extremely broad set of capabilities, including those no other benchmark can measure
Connectedness - Xent games are fully connected. You can "move" between any two games using a finite series of changes
Transferability - Getting better at one Xent game will improve performance on other related games

When you combine these qualities, you can see that Xent games provide a clear path towards a concretely defined form of AGI.

Xent games have other useful properties beyond these, but it's important to understand these three points above all others.

Let's analyze each of these qualities and then combine them to understand their utility fully.

1. Generality

Showing that Xent games are highly general isn't difficult. For a short demonstration, it should suffice to enumerate some examples of what is definitely possible to implement as Xent games:

Chess
LLM jailbreak prompt construction
LLM jailbreak prompt defense
Math theorem proving
Writing great endings to a story
Removal of unused elements from a story
Identification of emotional tone of a text
Identification of psychological motivations of a character in a story

At first glance, these may appear ridiculous to you. How can a game possibly represent "theorem proving"? Let me demonstrate.

Before reading this, you probably want to read about Xent games, so you can understand the code

Suffice it to say, the space of Xent games is extremely broad and covers just about any subject or qualities you can imagine.

2. Connectedness

The Xent game space is connected, meaning that you can move from one game to any other game with a finite series of edits. This is quite easy to show once you examine the structure of Xent games.

Xent games have a very simple structure. Each line of a Xent game contains an instruction. The space of variables (registers) is fixed and none of the registers can contain null values. The set of possible operations is fixed and it's not possible to define new operations. While looping is possible, all loops have a predefined, constant number of iterations, making it impossible to enter an infinite loop.

Given this structure, it's easy to see that if you take a line of code from a valid game and place it in another valid game, the new game will also be valid. You can "blend" games together in any way and never construct an invalid game.

Showing that this space of games is connected is very easy. Simply add lines from the new game and remove lines from the old one. Thus a finite series of edits can take you from any game to another.

Of course, there are often "smoother" paths between games and the process I just described certainly isn't optimal. But the fact remains that its always possible to find a path between any two games.

3. Transferability

Xent games are shown to have transferability. This means that getting better at one Xent game will improve performance on related Xent games.

Conceptually, this seems obvious. If you take a Xent game and apply a small edit to the game (such as adding or removing a line), then those games are very similar. It would be surprising if there was no transference of skills between the games.

Empirically, we have shown transfer learning between Xent games with some simple examples. This provides a baseline validation of the idea, but it certainly isn't total proof.

Transfer learning is an essential part of the Xent game thesis. But it isn't something that is possible to prove using theory alone. That being said, we are highly confident that transfer learning will continue to work as we scale up and test against broader, more complex sets of games.

A Path to AGI

To recap: we have a space of games that is highly general, completely connected, and allows for transfer learning between related games.

Consider this sequence of steps:

First, start with a small set of games that test LLM qualities you think are important and interesting and use this as an evaluation set.
Train an LLM on the evaluation set
Next, we want to find more interesting games to add to our evaluation set. Use the connectedness property and walk outwards from the current evaluation set.
Keep walking outwards until you find a game that has low transference value with the current evaluation set. The low transference value indicates that this game requires skills that are not utilized in the existing evaluation set.
Add this new game to the evaluation set.
GOTO 2

This algorithm, if the properties above hold true, has two mind-blowingly significant qualities:

This algorithm leads to a very concrete definition of AGI. When you can no longer find new games in step 4, then you have completely covered the entire space of Xent games. An agent that can perform well in all of the games in your evaluation set can perform well on every possible Xent game. Combine this fact with the generality property of Xent games and you can now say that such an agent has achieved AGI.
If transference holds as expected, then this algorithm will work. This path might not be optimal. It might not even be computationally feasible. But algorithm does define a clear and concrete path to AGI.

The algorithm presented here is a highly simplified summary of a much more tightly defined algorithm in our paper.

Our vision is this: an ever-expanding benchmark, a staircase that spirals up endlessly.