Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du*, Yelong Shen*
We introduce ThetaEvolve, an open-source pipeline that simplifies (e.g., with single LLM) and extends AlphaEvolve to efficiently scale both ❄️in-context learning and 🔥RL training at test time.
With ThetaEvolve, an 8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀:
⭕Circle packing:
-
AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276
-
Ours (R1-Qwen3-8B): 2.63598308
Our RL environment follows the same setup as slime and OpenEvolve. We use Docker (run in ThetaEvolve folder):
# fixed image, haven't checked on the latest image
docker pull slimerl/slime:v0.5.0rc0-cu126
# Start the container
docker run --rm --name slime-evolve \
--gpus all --ipc=host --shm-size=16g \
--ulimit memlock=-1 --ulimit stack=67108864 \
-v "$PWD":/workspace -w /workspace \
-v /path/to/disk:/path/to/disk \
-it slimerl/slime:v0.5.0rc0-cu126 /bin/bash
sudo docker run -d --name slime-evolve --gpus all --ipc=host --shm-size=16g --ulimit memlock=-1 --ulimit stack=67108864 -v "$PWD":/workspace -w /workspace slimerl/slime:v0.5.0rc0-cu126 sleep infinity
sudo docker exec -it slime-evolve /bin/bash
apt update
apt install -y screen
bash ./run.sh 2>&1 | tee run.logAfter entering the docker, run the installation commands:
cd /workspace
pip install -e .
cd openevolve_adapted
pip install --ignore-installed blinker
rm -rf openevolve.egg-info && pip install -e .
cd ..You could check our tasks in openevolve_adapted/examples. It is easy to extend to more tasks with continous objective values.
To run the experiments, you could change the parameters in run.sh, and then directly run bash run.sh (Notably, for 8B model, we need at least 8x80G GPUs like A100s).
Fist, remember to set the save_path to store ckpts:
export SAVE_PATH=/path/to/disk/save
Then for example, if you want to run prorl-v2-1.5B, circle packing, RL training, original score as reward, you could set:
#### Model selection ####
SMALL_MODEL_NAME="dpsk_prorl_v2_1.5b"
#### Task configuration ####
TASK="circle_packing_modular"
#### CONFIG_POSTFIX options ####
CONFIG_POSTFIX="it_XL"
#### Training mode: True for training, False for inference-only ####
IS_TRAINING=True
#### Training parameters ####
# Options: "original_reward", "rl_normalized_reward"
REWARD_PROCESS_TYPE="original_reward"
#### Lazy output penalty ####
# 1 -> child = parent
# 2 -> child = any program in database
LAZY_OUTPUT_PENALTY=1Finally set the wandb configurations:
WANDB_API_KEY=aaa
WANDB_ENTITY=bbb
WANDB_PROJECT=cccThen you can directly run
bash run.shYou could also adjust more parameters in scripts_evolve/Nemotron-Research-Reasoning-Qwen-1.5B/general.sh. Like ckpt saving frequency (default 10), number of evaluation threads (default 16), gpus (default 8), etc.
Some results we obtain are available in Results. You can run python vis.py to see the verification results in each sub-task directory.
For example, we have our best-known solution for circle packing (with zero tolerance) in Results/CirclePacking/figs/8B-w_RL@65-Formal.png and AlphaEvolve's solution in Results/CirclePacking/figs/AlphaEvolve.png:
We point out that our solution is better than AlphaEvolve’s, and that our configuration is asymmetric, whereas AlphaEvolve’s solution is symmetric.
The program for finding it (with 1e-6 tolerance as OpenEvolve verification, detailed in paper) is shown in Results/CirclePacking/programs/8B-w_RL@65.py. For the formal one (without tolerance as AlphaEvolve), the program is shown in Results/CirclePacking/programs/8B-w_RL@65-Formal.py. The later one has a specific function for determing the size for shrinking radii, but in general, you could get close results by shrinking radii with values like 1e-9.
We also provide results from other tasks for visualization.
If you want to run these programs or the initial program, you could try to assign the parameters from config file.
TASK="circle_packing_modular"
CONFIG_POSTFIX="it_XL"
# # test command with verifier
OPENEVOLVE_CONFIG_PATH=$PWD/examples/${TASK}/configs/config_${TASK}_${CONFIG_POSTFIX}.yaml \
PYTHONPATH=$PWD \
python $PWD/examples/${TASK}/evaluators/evaluator_modular.py \
$PWD/examples/${TASK}/initial_programs/initial_program.py
Or you could just replace the parameters to directly rerun.
If you find our work useful, please consider citing:
@article{wang2025thetaevolve,
title={ThetaEvolve: Test-time Learning on Open Problems},
author={Wang, Yiping and Su, Shao-Rong and Zeng, Zhiyuan and Xu, Eva and Ren, Liliang and Yang, Xinyu and Huang, Zeyi and He, Xuehai and Ma, Luyao and Peng, Baolin and Cheng, Hao and He, Pengcheng and Chen, Weizhu and Wang, Shuohang and Du, Simon Shaolei and Shen, Yelong},
journal={arXiv preprint 2511.23473},
year={2025}
}


