arc-agi

Taking a swing at it. This is a working repo, all material should be considered incomplete and for demonstration purpose only.

Step 1

Clone the dataset from the following repo into this repo directory https://github.com/arcprize/ARC-AGI-2

Step 2+

Here's the idea - we use transformers. Boom! Big reveal!

But we're not throwing a pre-trainer at it, we're training from scratch.

We might be able to get away with this by augenting the dataset a bunch with affine transforms and color conjugations. Estimates suggest the 1,000 example ARC AGI 2 dataset gets yeeted all the way up to like... 1.2B examples that way? It's a lot. Maybe enough.

It's super necessary too because we want the model to focus entirely on in-example patterns rather than learning anything special about the color blue (or the number 9 or whatever).

We're also using something a bit fancy for the positional encoding:

https://arxiv.org/pdf/2406.10322

Using this we can develop a 4D positional encoding (y, x, input_output, example_index) for the examples and 3D positional encoding (y, x, input_output) for the target which will hopefully preserve all the info we need for the model to learn to be generally smört.

Also interesting finding - the NoPE architecture (used intermittently in Llama 4) can be recovered from LieRE (I think) because it can learn not to apply any rotation at all.

https://arxiv.org/pdf/2305.19466

The transformer is going to be a fully tricked out encoder-decoder architecture. The encoder will be non-causal and the decoder will be causal (kinda has to be for sequential prediction).

Then we'll include a clever delimiting scheme for grid row-breaks and start/end sequence things:

    # VOCAB = "0123456789,<|>*"
    # "," = row separator
    # "<" = example begins
    # "|" = input/output separator
    # ">" = example ends
    # "*" = padding

The inputs to the encoder will look as follows:

"<{example_0_input}|{example_0_output}>...<{example_N_input}|{example_N_output}>"

Corresponding inputs to the decoder will look as follows

"<{target_input}|{target_output}>"

A relatively small input / output example (training sample 5) would look like this:

encoder input: <010,101,010,101,010,101|080,808,080,808,080,808,080,808,080><010,011,010,010,011,010|080,088,080,080,088,080,080,088,080><010,011,010,110,010,011|080,088,080,880,080,088,080,880,080>

decoder input: <111,010,010,111,010,010|888,080,080,888,080,080,888,080,080>

These will be accompanied by tensors which encode the positional information about each token in cartesian coordinates, which will be used to compute the learned rotation for that coordinate.

At inference we will prompt the model with "<{target_input}|" and continue to generate until it returns the ">" symbol.

Hopefully it returns some sequence that can be sensibly reconstructed into a tensor/array.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
SS		SS
arc_agi_utils		arc_agi_utils
attention/primitives		attention/primitives
fsdp_utils		fsdp_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arc.isc		arc.isc
requirements.txt		requirements.txt
ring_test.isc		ring_test.isc
ring_test.py		ring_test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arc-agi

Step 1

Step 2+

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

arc-agi

Step 1

Step 2+

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages