Skip to content

"Reasoning with sampling" to improve efficiency and quality #9

@naasking

Description

@naasking

Very cool project! While reading the clever self-embeddings approach, it occurred to me that some other recent work [1] was complementary. Rather than only scoring the quality after the sequence has been generated, "reasoning with sampling" computes a probability distribution of the output while it's generating, and backtracks and regenerates blocks on the fly if it sees falling confidence. This should let you filter out some bad candidates without having to complete and then score them via the geometric lens, thus a) improving end to end efficiency, and b) improving overall quality; having to evaluate fewer bad candidates means evaluating more possibly good candidates!

If I understand correctly, you've already made modifications to llama.cpp to extract the self-embeddings, so adding the MCMC over the logits seems like a natural evolution.

[1] Reasoning with Sampling: Your Base Model is Smarter Than You Think, https://arxiv.org/abs/2510.14901

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions