Adaptive Prompt Structure Factorization: A Framework for Self-Discovering and Optimizing Compositional Prompt Programs
Automatic prompt optimization through structured factor-level refinement
aPSF (Adaptive Prompt Structure Factorization) is an automatic prompt optimization framework that discovers and refines prompt structures through error-guided factor selection and interventional single-factor optimization. Given a task description and a small set of examples, aPSF automatically:
- Discovers the latent factor structure of a prompt (e.g., tone, format, perspective) via an Architect LLM.
- Selects the most impactful factor to refine at each step using error-guided factor selection.
- Optimizes the selected factor with error-driven feedback, producing improved prompt candidates.
- Evaluates candidates with a unified scoring pipeline and accepts improvements.
| Category | Datasets |
|---|---|
| Math | gsm8k, multiarith, gsm_hard, aime2025, competition_math |
| Logic | aqua, bbh_all (27 tasks), bbh_<task_name> |
| Knowledge | mmlu, mmlu_<subject> (57 subjects), gpqa, gpqa_<domain> |
- Python >= 3.9 (recommended 3.10 / 3.11)
pip install -r requirements.txtThe core dependencies include openai, torch, transformers, accelerate, datasets, numpy, tqdm, and tabulate. See requirements.txt for the full list.
Set your API keys as environment variables (recommended) or edit config.py directly:
export OPENAI_API_KEY="sk-..."
export SILICONFLOW_API_KEY="..."
export GROQ_API_KEY="..."
export DASHSCOPE_API_KEY="..."aPSF uses two LLM roles configured in config.py under MODELS:
| Role | Purpose | Example |
|---|---|---|
architect |
Structure discovery & factor optimization | Qwen3-8B, gpt-oss-120b |
worker |
Task execution (answer generation) | Llama-3.1-8B, Qwen2.5-7B |
aPSF is compatible with any OpenAI-compatible endpoint. For local deployment:
# Ollama
ollama run qwen2.5:7b
# Set api_base_id to "ollama" in config.py
# vLLM
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-Instruct --port 8000
# Set api_base_id to "qwen_vllm" or "local_llm" in config.pyDataset paths are defined in config.py under DATA_PATHS. Organize your data as follows:
data/
├── gsm_data/ # GSM8K
├── BIG-Bench-Hard-data/ # BBH (27 tasks)
├── AQuA-data/ # AQuA
├── MultiArith-data/ # MultiArith
├── MMLU-data/ # MMLU (57 subjects)
├── GSM-hard/ # GSM-Hard
├── AIME2025/ # AIME 2025
├── competition_math/ # Competition Math
├── gpqa/ # GPQA
└── human_eval/ # HumanEval
Update the paths in DATA_PATHS to point to your local data directory.
Verify your setup with the built-in sanity check:
python main.pyThis tests the Architect's structure discovery and the Worker LLM's generation capability.
python run_experiments.py --dataset <DATASET> --method <METHOD> [OPTIONS]Required arguments:
| Argument | Description |
|---|---|
--dataset |
Dataset name (e.g., gsm8k, bbh_all, mmlu, gpqa) |
--method |
Method to run: apsf or an ablation variant (see below) |
Optional arguments:
| Argument | Description |
|---|---|
--feedback |
Enable reflection-based optimization using error feedback |
--resume |
Resume from the last checkpoint |
--step N |
Override the number of optimization steps |
--initial-prompt TEXT |
Start optimization from a given prompt (presets: cot, analytical, expert) |
# aPSF on GSM8K
python run_experiments.py --dataset gsm8k --method apsf
# With Chain-of-Thought initial prompt
python run_experiments.py --dataset gsm8k --method apsf --initial-prompt cot
# Enable reflection optimization
python run_experiments.py --dataset gsm8k --method apsf --feedback
# Full BBH benchmark (27 tasks) with checkpoint resume
python run_experiments.py --dataset bbh_all --method apsf --resume
# Single BBH task
python run_experiments.py --dataset bbh_web_of_lies --method apsf
# MMLU single subject
python run_experiments.py --dataset mmlu_abstract_algebra --method apsf
# GPQA
python run_experiments.py --dataset gpqa --method apsfQ: How do I use a different LLM as the architect or worker?
A: Edit the MODELS section in config.py. Set provider, api_base_id, model_name, and api_key for each role. Any OpenAI-compatible endpoint works.
Q: GPU acceleration?
A: Install torch with CUDA support and accelerate. Local models via llama_api.py will automatically use GPU when available.
Q: Missing dependencies?
A: Run pip install -r requirements.txt. For tokenization issues, ensure sentencepiece is installed.
Q: How to add a new dataset?
A: 1) Create a loader in data_loader/ inheriting from BaseLoader. 2) Create an evaluator in evaluation/ if needed. 3) Add the dataset config to DATASET_CONFIG in config.py.
MIT License