This repo is my implementation of the mode-connectivity result in 1. The paper posits that any two local minima in the loss landscape can be connected by a curve through a valley in the loss landscape. Apart from the curiousity of the fact that this is possible, the result can also be used for quick ensemble samling for uncertainty quantification.
Assume that we have a class of models
In the paper, the path is parametrized as a chain of two straight lines that both connect to a third parameter set
Or as I have done in this implementation, the path is parameterized as a bezier curve connecting the start and end points:
Given that the two end-point models parametrized by
where
The loss is minimized by first sampling
The following results are created using the standard setting for modeconnectivity.py as described below. CIFAR10 data has been used for this particular experiment. The model used is a pretty standard convolutional neural network with 3 convolutional layers and two linear layers, ReLU activation and 50% dropout. See models.py for further details.
The code first trains the start and end models, and next the
The loss landscape, projected unto the plane suspended by the beziercurve is plottet below. Note that the landscape is squeezed such that 
It can indeed be seen that the curve lies in a valley as posited in the paper.
Given that
Finally, the question is: What if use parameter sets sampled along the curve as ensembles? In the normal setup for a classification task, the model would predict logits
Instead, in ensemble prediction we use the sampled parameter sets and average over the output:
The idea is that the parameter samples
This should in turn result in a more robust model.
Create a virtual environment and install the packages:
python -m pip install -e .
Then the code can be run with the standard settings with
cd modeconnectivity
python3 modeconnectivity.py
mode-connectivity/
├── LICENSE
├── README.md
├── requirements.txt
├── data/
│ ├── train/
│ └── test/
├── experiments/
│ └── curve_experiment_CIFAR10_CIFAR10ConvNet/
│ ├── curve_model/
│ ├── end_model/
│ ├── figures/
│ ├── logs/
│ ├── models/
│ └── start_model/
├── notebooks/
├── scripts/
└── src/
├── curve_eval.py
├── curve_plots.py
├── modeconnectivity.py
├── models.py
├── scheduler.py
└── train.py
LICENSE: License for the project.README.md: Project overview, results, and usage instructions.requirements.txt: Python dependencies.data/: Local training and test datasets (MNIST, FashionMNIST, CIFAR-10).experiments/: Saved outputs from runs (models, logs, plots, and artifacts).notebooks/: Interactive notebooks for exploration and analysis.scripts/: Utility scripts for running or automating experiments.src/: Core source code for training, curve optimization, evaluation, and plotting.train.py: Standard model training routines.modeconnectivity.py: Main script to train endpoint models and fit the curve model.models.py: Model architectures used in experiments.scheduler.py: Learning-rate scheduling logic.curve_eval.py: Evaluation utilities for models along the curve.curve_plots.py: Plotting utilities for landscapes and curve metrics.
- Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Loss surfaces, mode connectivity, and fast ensembling of DNNs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 8803–8812. https://arxiv.org/pdf/1802.10026
