"Reasoning with sampling" to improve efficiency and quality

Very cool project! While reading the clever self-embeddings approach, it occurred to me that some other recent work [1] was complementary. Rather than only scoring the quality after the sequence has been generated, "reasoning with sampling" computes a probability distribution of the output while it's generating, and backtracks and regenerates blocks on the fly if it sees falling confidence. This should let you filter out some bad candidates without having to complete and then score them via the geometric lens, thus a) improving end to end efficiency, and b) improving overall quality; having to evaluate fewer bad candidates means evaluating more possibly good candidates!

If I understand correctly, you've already made modifications to llama.cpp to extract the self-embeddings, so adding the MCMC over the logits seems like a natural evolution.

[1] Reasoning with Sampling: Your Base Model is Smarter Than You Think, https://arxiv.org/abs/2510.14901

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Reasoning with sampling" to improve efficiency and quality #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"Reasoning with sampling" to improve efficiency and quality #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions