SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization

SynCraft is a reasoning-based framework that reframes synthesizability optimization not as a sequence translation task, but as a precise structural editing problem. Leveraging the emergent reasoning capabilities of Large Language Models (LLMs), SynCraft navigates the "synthesis cliff" where minimal structural modifications yield significant gains in synthetic feasibility.

By predicting executable sequences of atom-level edits rather than generating SMILES strings directly, SynCraft circumvents the syntactic fragility of LLMs while harnessing their chemical intuition.

Key Features

Generative Editing via In-Context Reasoning: Decouples strategic planning from chemical execution. The LLM acts as a chemical strategist, reasoning about synthetic liabilities before prescribing edits.
Discrete Editing Action Space: Uses a precise JSON-based command set (DEL_ATOM, ADD_BOND, MUTATE_ATOM, etc.) to modify molecular graphs deterministically, ensuring validity.
Interaction-Aware Optimization: Incorporates 3D protein-ligand interaction data (via AutoDock Vina and PLIP) into the prompting strategy to preserve critical pharmacophores during optimization.
Synthesis Cliff Navigation: Focuses on minimal, high-impact edits to transform "unsynthesizable" molecules into accessible analogs without destroying the original scaffold.

Structure

SynCraft-Core/
├── assets/                 # Data assets
│   ├── reasoning.json      # Golden examples with reasoning traces
│   ├── RIPK1.txt           # Example molecule lists
│   └── unsolved.json       # Input datasets
├── config/                 # Configuration files
├── notebooks/              # Jupyter notebooks for analysis
├── scripts/                # Shell scripts for running experiments
│   ├── inference_enhanced.sh
│   └── inference_bioactivity_constrain.sh
├── src/                    # Source code
│   ├── inference_enhanced.py             # Standard inference script
│   ├── inference_bioactivity_constrain.py # Interaction-aware inference
│   ├── utils.py                          # Core editing & reconstruction logic
│   ├── extract_interaction.py            # PLIP interaction analysis
│   └── docking_utils.py                  # Vina docking wrappers
└── vina/                   # Vina executables and receptor files

Installation

Prerequisites

Python 3.8+
AutoDock Vina (for interaction-aware mode)
PLIP (for interaction-aware mode)
OpenBabel

Python Dependencies

Install the required Python packages:

pip install rdkit litellm loguru meeko openbabel tqdm numpy syntheseus

Environment Setup

SynCraft uses litellm to interface with LLMs (e.g., Gemini, DeepSeek). You must set your API keys in your environment variables:

export GEMINI_API_KEY='your-gemini-api-key'
# or
export DEEPSEEK_API_KEY='your-deepseek-api-key'

Usage

1. Standard Synthesizability Optimization

To run the standard optimization pipeline which focuses on restoring synthesizability using chemical reasoning:

cd scripts
bash inference_enhanced.sh

Under the hood (src/inference_enhanced.py):

Loads unsynthesizable molecules.
Retrieves similar "golden examples" (pairs of unsynthesizable $\to$ synthesizable molecules) for few-shot prompting.
Prompts the LLM to reason about synthetic liabilities and generate a JSON edit sequence.
Applies the edits deterministically to produce the result.

Key Arguments:

--dataset: The dataset key in the input JSON.
--model: The LLM model to use (e.g., gemini/gemini-2.5-pro).
--few-shot-k: Number of few-shot examples to use (default: 5).
--pass-k: Number of parallel inference passes per molecule.

2. Interaction-Aware Optimization

To optimize molecules while preserving binding interactions (requires Vina and receptor files):

cd scripts
bash inference_bioactivity_constrain.sh

Under the hood (src/inference_bioactivity_constrain.py):

Docks the input molecule into the target receptor.
Analyzes interactions (H-bonds, $\pi$-stacking, etc.) using PLIP.
Injects these constraints into the LLM prompt (e.g., "Atom 5 forms a critical Hydrogen Bond...").
The LLM generates edits that respect these biological constraints.

Configuration: Ensure your receptor files (.pdbqt, .pdb, config.txt) are correctly placed in the vina/ directory and referenced in the script.

Methodology

The Edit Action Space

SynCraft defines a compact action space $\mathcal{A}$ where operations are referenced via unique atom-map numbers:

DEL_ATOM: Removes a specific atom.
MUTATE_ATOM: Changes the atomic element.
ADD_ATOM: Introduces a new atom.
ADD_BOND / DEL_BOND: Creates or removes bonds.
CHANGE_BOND: Modifies bond order/aromaticity.
SET_CHIRAL / SET_BOND_STEREO: Defines stereochemistry.

Workflow

Retrieval: Finds similar "Synthesis Cliff" examples.
Reasoning: The LLM analyzes the molecule and articulates a plan.
Execution: The plan (JSON) is executed by the deterministic toolkit (src/utils.py).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization

Key Features

Structure

Installation

Prerequisites

Python Dependencies

Environment Setup

Usage

1. Standard Synthesizability Optimization

2. Interaction-Aware Optimization

Methodology

The Edit Action Space

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
logs		logs
scripts		scripts
src		src
vina		vina
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization

Key Features

Structure

Installation

Prerequisites

Python Dependencies

Environment Setup

Usage

1. Standard Synthesizability Optimization

2. Interaction-Aware Optimization

Methodology

The Edit Action Space

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages