ThetaEvolve: Test-time Learning on Open Problems

Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du*, Yelong Shen*

Outline

We introduce ThetaEvolve, an open-source pipeline that simplifies (e.g., with single LLM) and extends AlphaEvolve to efficiently scale both ❄️in-context learning and 🔥RL training at test time.

With ThetaEvolve, an 8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀:

⭕Circle packing:

AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276
Ours (R1-Qwen3-8B): 2.63598308

Setup

Our RL environment follows the same setup as slime and OpenEvolve. We use Docker (run in ThetaEvolve folder):

# fixed image, haven't checked on the latest image
docker pull slimerl/slime:v0.5.0rc0-cu126

# Start the container
docker run --rm --name slime-evolve \
  --gpus all --ipc=host --shm-size=16g \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  -v "$PWD":/workspace -w /workspace \
  -v /path/to/disk:/path/to/disk \
  -it slimerl/slime:v0.5.0rc0-cu126 /bin/bash

sudo docker run -d --name slime-evolve   --gpus all --ipc=host --shm-size=16g   --ulimit memlock=-1 --ulimit stack=67108864   -v "$PWD":/workspace -w /workspace   slimerl/slime:v0.5.0rc0-cu126 sleep infinity


sudo docker exec -it slime-evolve /bin/bash

apt update
apt install -y screen

bash ./run.sh 2>&1 | tee run.log

After entering the docker, run the installation commands:

cd /workspace
pip install -e .
cd openevolve_adapted
pip install --ignore-installed blinker
rm -rf openevolve.egg-info && pip install -e .
cd ..

Tasks

You could check our tasks in openevolve_adapted/examples. It is easy to extend to more tasks with continous objective values.

Run

To run the experiments, you could change the parameters in run.sh, and then directly run bash run.sh (Notably, for 8B model, we need at least 8x80G GPUs like A100s).

Fist, remember to set the save_path to store ckpts:

export SAVE_PATH=/path/to/disk/save

Then for example, if you want to run prorl-v2-1.5B, circle packing, RL training, original score as reward, you could set:

#### Model selection ####
SMALL_MODEL_NAME="dpsk_prorl_v2_1.5b"

#### Task configuration ####
TASK="circle_packing_modular"

#### CONFIG_POSTFIX options ####
CONFIG_POSTFIX="it_XL"

#### Training mode: True for training, False for inference-only ####
IS_TRAINING=True

#### Training parameters ####
# Options: "original_reward", "rl_normalized_reward"
REWARD_PROCESS_TYPE="original_reward"

#### Lazy output penalty ####
# 1 -> child = parent
# 2 -> child = any program in database
LAZY_OUTPUT_PENALTY=1

Finally set the wandb configurations:

WANDB_API_KEY=aaa
WANDB_ENTITY=bbb
WANDB_PROJECT=ccc

Then you can directly run

bash run.sh

You could also adjust more parameters in scripts_evolve/Nemotron-Research-Reasoning-Qwen-1.5B/general.sh. Like ckpt saving frequency (default 10), number of evaluation threads (default 16), gpus (default 8), etc.

Results

Some results we obtain are available in Results. You can run python vis.py to see the verification results in each sub-task directory.

For example, we have our best-known solution for circle packing (with zero tolerance) in Results/CirclePacking/figs/8B-w_RL@65-Formal.png and AlphaEvolve's solution in Results/CirclePacking/figs/AlphaEvolve.png:

We point out that our solution is better than AlphaEvolve’s, and that our configuration is asymmetric, whereas AlphaEvolve’s solution is symmetric.

The program for finding it (with 1e-6 tolerance as OpenEvolve verification, detailed in paper) is shown in Results/CirclePacking/programs/8B-w_RL@65.py. For the formal one (without tolerance as AlphaEvolve), the program is shown in Results/CirclePacking/programs/8B-w_RL@65-Formal.py. The later one has a specific function for determing the size for shrinking radii, but in general, you could get close results by shrinking radii with values like 1e-9.

We also provide results from other tasks for visualization.

If you want to run these programs or the initial program, you could try to assign the parameters from config file.

TASK="circle_packing_modular"

CONFIG_POSTFIX="it_XL"

# # test command with verifier
OPENEVOLVE_CONFIG_PATH=$PWD/examples/${TASK}/configs/config_${TASK}_${CONFIG_POSTFIX}.yaml \
PYTHONPATH=$PWD \
python $PWD/examples/${TASK}/evaluators/evaluator_modular.py \
$PWD/examples/${TASK}/initial_programs/initial_program.py

Or you could just replace the parameters to directly rerun.

Citation

If you find our work useful, please consider citing:

@article{wang2025thetaevolve,
  title={ThetaEvolve: Test-time Learning on Open Problems},
  author={Wang, Yiping and Su, Shao-Rong and Zeng, Zhiyuan and Xu, Eva and Ren, Liliang and Yang, Xinyu and Huang, Zeyi and He, Xuehai and Ma, Luyao and Peng, Baolin and Cheng, Hao and He, Pengcheng and Chen, Weizhu and Wang, Shuohang and Du, Simon Shaolei and Shen, Yelong},
  journal={arXiv preprint 2511.23473},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Results		Results
assets		assets
benchmark		benchmark
docker		docker
docs		docs
examples		examples
imgs		imgs
openevolve_adapted		openevolve_adapted
pacevolve		pacevolve
scripts		scripts
scripts_evolve		scripts_evolve
slime.egg-info		slime.egg-info
slime		slime
slime_plugins		slime_plugins
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_slime.md		README_slime.md
README_slime_zh.md		README_slime_zh.md
build_conda.sh		build_conda.sh
log.txt		log.txt
preparation.sh		preparation.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
run_eplb.sh		run_eplb.sh
run_kernel.sh		run_kernel.sh
run_kernel_bench.sh		run_kernel_bench.sh
run_llmsr.sh		run_llmsr.sh
run_multievolve.sh		run_multievolve.sh
run_pkpo.sh		run_pkpo.sh
run_wukong.sh		run_wukong.sh
setup.py		setup.py
train.py		train.py
train_async.py		train_async.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ThetaEvolve: Test-time Learning on Open Problems

Outline

Setup

Tasks

Run

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ThetaEvolve: Test-time Learning on Open Problems

Outline

Setup

Tasks

Run

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages