Very cool project! While reading the clever self-embeddings approach, it occurred to me that some other recent work [1] was complementary. Rather than only scoring the quality after the sequence has been generated, "reasoning with sampling" computes a probability distribution of the output while it's generating, and backtracks and regenerates blocks on the fly if it sees falling confidence. This should let you filter out some bad candidates without having to complete and then score them via the geometric lens, thus a) improving end to end efficiency, and b) improving overall quality; having to evaluate fewer bad candidates means evaluating more possibly good candidates!
If I understand correctly, you've already made modifications to llama.cpp to extract the self-embeddings, so adding the MCMC over the logits seems like a natural evolution.
[1] Reasoning with Sampling: Your Base Model is Smarter Than You Think, https://arxiv.org/abs/2510.14901
Very cool project! While reading the clever self-embeddings approach, it occurred to me that some other recent work [1] was complementary. Rather than only scoring the quality after the sequence has been generated, "reasoning with sampling" computes a probability distribution of the output while it's generating, and backtracks and regenerates blocks on the fly if it sees falling confidence. This should let you filter out some bad candidates without having to complete and then score them via the geometric lens, thus a) improving end to end efficiency, and b) improving overall quality; having to evaluate fewer bad candidates means evaluating more possibly good candidates!
If I understand correctly, you've already made modifications to llama.cpp to extract the self-embeddings, so adding the MCMC over the logits seems like a natural evolution.
[1] Reasoning with Sampling: Your Base Model is Smarter Than You Think, https://arxiv.org/abs/2510.14901