DeepXube (pronounced "Deep Cube") aims to solve pathfinding using a combination of deep reinforcement learning and heuristic search.
- Learn a heuristic function that maps states and goals to an estimate of the cost-to-go from the given state to the given goal.
- Use the learned heuristic function with heuristic search algorithms, such as batch weighted A* search or batch weighted Q* search, to solve problem instances.
DeepXube is a generalization of DeepCubeA (code, paper).
For any issues, you can create a GitHub issue or contact Forest Agostinelli (foresta@cse.sc.edu).
Outline:
pip install deepxube
See INSTALL.md for more information.
Documentation of all modules is here.
deepxube can be run from the command line via the deepxube command (you may have to use python -m deepxube on Windows).
Run deepxube --help for detailed information. --help can also be run on
positional arguments (e.g. deepxube train --help).
There is a simple example domain located in deepxube/domains/grid.py and example heuristic
function located in deepxube/heuristics/grid_heur.py.
After installing deepxube run:
- Information
- Domain information:
deepxube domain_info - Heuristic information:
deepxube heuristic_info - Pathfinding information:
deepxube pathfinding_info
- Domain information:
- Generate problem instances for solving:
deepxube problem_inst --domain grid.7 --step_max 1000 --num 100 --file valid.pkl --redo - Visualization
- Visualize random problem instance:
deepxube viz --domain grid.7 --steps 10 - Visualize a problem instance from a file:
deepxube viz --domain grid.7 --file valid.pkl --idx 13
- Visualize random problem instance:
- Time to ensure basic functionality (Can use breakpoints in your code to debug):
- With deepxube residual neural network:
deepxube time --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type V - With deepxube residual neural network and deep Q-network:
deepxube time --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type QFix - With custom neural network:
deepxube time --domain grid.7 --heur gridnet.8CH_200FC --heur_type V
- With deepxube residual neural network:
- Training
- Train heuristic function:
deepxube train --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type V --pathfind bwas --step_max 100 --up_itrs 100 --search_itrs 20 --backup -1 --procs 1 --batch_size 50 --max_itrs 5000 --dir dummy/ - Use tensorboard to see training progress:
tensorboard --logdir=dummy/ - Plot more detailed training information with interactive slider for training iteration:
deepxube train_summary --dir dummy
- Train heuristic function:
- Solving
- Solve problem instances with all-zero heuristic:
deepxube solve --domain grid.7 --heur_type V --pathfind bwas.1_1.0_0.0 --file valid.pkl --results results_zeros_ex/ --redo - Solve problem instances with trained heuristic:
deepxube solve --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_file dummy/heur.pt --heur_type V --pathfind bwas.1_1.0_0.0 --file valid.pkl --results results_trained_ex/ --redo - Solving with the trained heuristic should have a significantly lower number of nodes generated and number of iterations.
- You can get the actions taken for each instance with the "actions" key in the dictionary saved in the
results.pklfile in the--resultsdirectory. - Visualize solutions with
deepxube viz --domain grid.7 --file results_trained_ex/results.pkl --idx 3 --soln(This is not yet pip installable). This prints actions taken. You can implement__repr__for the action class to determine what gets printed for each action.
- Solve problem instances with all-zero heuristic:
User-defined domains should go in the ./domains/ folder.
deepxube will recursively search this directory and import all modules so that domains are registered.
Grid inherits from Mixin classes from deepxube.base.domain, which give it additional functionality (see the domain documentation).
ActsEnumFixed:Gridimplementsget_actions_fixed- Methods obtained:
sample_action,get_state_actions,expand,get_num_acts
- Methods obtained:
StartGoalWalkable:Gridimplementssample_goal_from_stateandsample_start_states- Methods obtained:
sample_start_goal_pairs
- Methods obtained:
By using registers from deepxube.factories.domain_factory, Grid can be obtained from its name.
Furthermore, a parser can be implemetned and registered to allow one to specify arguments for the constructor via the command line.
By convention, everything after '.' are considered arguments.
Running deepxube domain_info in a directory with domains/grid.py should produce information about grid (amongst other available domains):
Visualization of states/goals and the domain transition function can be useful to validating it.
To accomplish this, a domain can inherit from StateGoalVizable to convert state/goal pairs to figures
and inherit from StringToAct to be able to type actions into the command line and see how it changes the state.
By running deepxube viz --domain grid.7 --steps 10 will create a start/goal pair by taking a random walk of length 10 and visualize it.
One can vary the grid size by simply changing the number (e.g. deepxube viz --domain grid.10 --steps 10).
After applying an action, the transition cost and whether or not the goal is reached will be printed.
deepxube trains heuristic functions represented as neural networks. Different kinds of neural networks expect different kinds of inputs.
By inheriting from Mixins from deepxube.base.nnet_input a NNetInput class can be dynamically created for a domain.
Grid inherits from HasFlatSGActsEnumFixedIn and implements get_input_info_flat_sg, to_np_flat_sg, and actions_to_indices.
From this, if a neural network expects a flat (1D) input from a state/goal pair, then a NNetInput class that tells the neural network the
dimension of the input, the number of inputs, and that converts state/goal pairs to numpy arrays is dynamically created.
Furthermore, if deep Q-network is used with an output for each action, then the heuristic function is automatically modified to have the correct output dimension.
Hence, the flat_sg and flat_sg_actfix in the domain information.
A NNetInput object has a to_np function which converts a problem instance, or a problem instance along with actions in the case of a deep Q-network that
takes an action as input, to a list of numpy arrays. Each row of each array in the list of numpy arrays represents a different problem instance. Therefore,
multiple arrays of different sizes can be used to represent a single problem instance, if needed.
With this dynamic NNetInput creation, Grid can be used with deepxube's built in resnet_fc heuristic function, which expects a flat input.
Custom neural network input types can also be created and registered. Given a heuristic function, deepxube searches for a registered NNetInput class that
matches its expected input. If multiple exist, it uses the first one it finds.
User-defined neural networks for heuristic functions should go in the ./heuristics/ folder.
deepxube will recursively search this directory and import all modules so that domains are registered.
A heuristic function is constructed given a neural network input of a pre-determined type, the dimensionality of the output, and a boolean indicating whether
or not the neural network represents a deep Q-network with a fixed output size. The forward portion of the neural network expects a list of Tensors
(this corresponds to the list of numpy arrays from the NNetInput) and returns a Tensor representing heuristic values. If the output is a fixed set of actions,
the heuristic neural network automatically obtains the corresponding indices from the output.
An example of a custom neural network along with its parser is shown in GridNet and GridNetParser in deepxube/heuristics/grid_heur.py.
The current pathfinding algorithms implemented include the batched and weighted versions of A* and Q* search, a greedy policy with a standard heuristic function (V) and a deep Q-network (Q). There are other pathfinding algorithms specifically used for training a heuristic function via supervised learning that sets the target cost-to-go to be that of the path cost obtained from a random walk (see References for more information).
Training is given the update frequency
When doing reinforcement learning based training, search is performed with the heuristic function with the given pathfinding algorithm for
DeepCubeAI: Learning world models for training and search (code, paper).
DeepCubeAg: Specifying goals with answer set programming (code, paper).
CDGR: Conflict-driven goal reaching (code, paper).
DeepCubeA (Learning heuristic functions with reinforcement learning):
@article{agostinelli2019solving,
title={Solving the Rubik’s cube with deep reinforcement learning and search},
author={Agostinelli, Forest and McAleer, Stephen and Shmakov, Alexander and Baldi, Pierre},
journal={Nature Machine Intelligence},
volume={1},
number={8},
pages={356--363},
year={2019},
publisher={Nature Publishing Group UK London}
}
DeepCubeAg (Learning heuristic functions that generalize across goals):
@inproceedings{agostinelli2024specifying,
title={Specifying goals to deep neural networks with answer set programming},
author={Agostinelli, Forest and Panta, Rojina and Khandelwal, Vedant},
booktitle={Proceedings of the International Conference on Automated Planning and Scheduling},
volume={34},
pages={2--10},
year={2024}
}
Q* Search (Learning DQN heuristic functions and exploiting structure for faster search with Q* search):
@inproceedings{agostinelli2024q,
title={Q* search: Heuristic search with deep q-networks},
author={Agostinelli, Forest and Shperberg, Shahaf S and Shmakov, Alexander and McAleer, Stephen and Fox, Roy and Baldi, Pierre},
booktitle={ICAPS Workshop on Bridging the Gap between AI Planning and Reinforcement Learning},
year={2024}
}
Limited-horizon Bellman-based learning (LHBL):
@inproceedings{hadar2025beyond,
title={Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search},
author={Hadar, Gal and Agostinelli, Forest and Shperberg, Shahaf S},
booktitle = {AAAI},
year={2026}
}
CayleyPy (Training heuristic functions using supervised learning on the length of random walks):
@inproceedings{Chervov2025NeurIPSRubiks,
author = {Alexander Chervov and Kirill Khoruzhii and Nikita Bukhal and Jalal Naghiyev and Vladislav Zamkovoy and Ivan Koltsov and Lyudmila Cheldieva and Arsenii Sychev and Arsenii Lenin and Mark Obozov and Egor Urvanov and Alexey M. Romanov},
title = {A machine learning approach that beats Rubik’s cubes},
booktitle = {Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)},
year = {2025},
note = {Spotlight},
url = {https://neurips.cc/}
}

