Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Unix-socket "runtime keeper" process and hidden RuntimeKeeper CLI; moves per-buffer PTY ownership into keeper processes accessed via per-buffer Unix sockets; updates server runtime APIs, state, persistence, framing protocol, serde derives, and serializes affected multi-threaded tests via a cross-process test lock. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as "embers-cli (RuntimeKeeper)"
participant Keeper as "Runtime Keeper (PTY)"
participant Socket as "Unix Socket"
participant Server as "embers-server"
CLI->>Keeper: run_runtime_keeper(RuntimeKeeperCli)
Keeper->>Keeper: spawn PTY, terminal backend, snapshotter
Keeper->>Socket: bind & listen (keeper socket)
Server->>Socket: connect / attach
Server->>Socket: send RPC (write/resize/status/capture/kill)
Socket->>Keeper: deliver RPC request
Keeper->>Keeper: perform PTY action, update sequence/activity/title
Keeper->>Socket: send length-prefixed JSON response/update
Socket->>Server: receive response/update
Server->>Server: apply_runtime_status / record_buffer_update
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b86c41d to
8dbc2b0
Compare
991111a to
7e4827f
Compare
34f1237 to
6826cc1
Compare
85a63c3 to
8762d2d
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/embers-server/src/server.rs (2)
1715-1738:⚠️ Potential issue | 🔴 CriticalDon't drop the keeper handle on normal exit.
After Line 1716 removes
buffer_idfrombuffer_runtimes, everycapture_*and visible-snapshot path falls back tobuffer_runtime()'sExitederror. The keeper itself stays alive and can still serve the final screen incrates/embers-server/src/buffer_runtime.rs, so this strips scrollback/final-render access from exited panes until the next restart.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/server.rs` around lines 1715 - 1738, The code removes the runtime handle unconditionally in record_buffer_exit by calling self.buffer_runtimes.lock().await.remove(&buffer_id), which drops the keeper and prevents capture_*/visible-snapshot paths from serving final render; change the logic so the keeper handle is not removed on a normal exit: fetch (e.g., get/clone or peek) the runtime handle instead of removing it when should_interrupt is false so the keeper remains available for buffer_runtime() to serve Exited state, and only remove the entry when you truly intend to discard the runtime (for example when should_interrupt is true or after final cleanup). Ensure you update record_buffer_exit to conditionally remove from buffer_runtimes (referencing buffer_runtimes and record_buffer_exit) and preserve the keeper used by buffer_runtime()/Exited in buffer_runtime.rs.
253-257: 🧹 Nitpick | 🔵 TrivialRemove the unused server-side surface cache.
CI is already warning that
buffer_surfacesandBufferSurfaceare dead after this refactor. Keeping this path around obscures the new keeper-backed rendering flow and leaves stale code to maintain.Also applies to: 269-398
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/server.rs` around lines 253 - 257, Remove the unused server-side surface cache by deleting the Server struct field buffer_surfaces and the BufferSurface type and any associated imports/usages; search for symbols buffer_surfaces and BufferSurface (and matches in impls/constructors, e.g., where Server is created or mutated) and remove or refactor those code paths to rely on the keeper-backed rendering flow instead (cleanup any Mutex/BTreeMap/BufferId references tied only to this cache and remove tests or dead code referencing these symbols).crates/embers-server/src/buffer_runtime.rs (1)
311-325:⚠️ Potential issue | 🟠 MajorJoin the poller instead of detaching it.
Dropping the
JoinHandleon line 324 does not wait for the thread to finish; it just detaches it. This means shutdown can continue while the poller thread still owns the keeper connection and may still fire callbacks. Additionally, sinceDropcallsjoin_threads_blocking(), there's a risk of deadlock if the poller thread is being dropped. Join it explicitly with a self-join guard to prevent deadlock:Fix
impl BufferRuntimeInner { fn join_threads_blocking(&self) { self.stop.store(true, Ordering::Relaxed); + let current_thread = thread::current().id(); let mut threads = match self.threads.lock() { Ok(threads) => threads, Err(poisoned) => { error!( %self.buffer_id, "buffer runtime thread registry lock poisoned during shutdown" ); poisoned.into_inner() } }; - let _ = threads.poller.take(); + if let Some(poller) = threads.poller.take() { + if poller.thread().id() != current_thread { + let _ = poller.join(); + } + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 311 - 325, In join_threads_blocking (impl BufferRuntimeInner) don’t drop threads.poller (which detaches); instead take the JoinHandle from threads.poller, detect if it is the current thread (use a self-join guard / thread::current().id()) and if not call join() on the handle to wait for the poller to finish, otherwise detach/skip to avoid deadlock; ensure stop.store(true, ...) happens before joining so the poller can exit and handle poisoning the threads mutex as currently done when retrieving threads.poller.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 482-535: The keeper socket must be bound before spawning the PTY
child and reader/wait threads to avoid orphaned children on bind failure: move
the UnixListener::bind(&cli.socket_path) and creation of SocketCleanup before
calling pair.slave.spawn_command(...) and before spawning keeper_read_loop and
keeper_wait_loop threads; create the KeeperRuntime after successful bind (or at
least ensure bind errors are checked before spawn_command/threads run), then
spawn the child (child.clone_killer()/child.process_id()), start the reader/wait
threads (keeper_read_loop, keeper_wait_loop) using the runtime, and only then
proceed — update any uses of cli.socket_path in thread names accordingly so the
listener is reserved first and failures cleanly return an error without leaving
the child running.
- Around line 224-262: The four query methods (status, capture_snapshot,
capture_visible_snapshot, capture_scrollback_slice) perform blocking UnixStream
I/O and must be moved onto a blocking thread; change each to async (e.g., pub
async fn status(&self) -> Result<BufferRuntimeStatus>) and call
tokio::task::spawn_blocking to run the existing synchronous logic inside the
closure, returning the join handle's result (and map/join errors to
MuxError/Result as appropriate). Inside the spawn_blocking closure, acquire the
inner.connection lock and call
connection.status()/snapshot()/visible_snapshot()/scrollback_slice() just like
before; ensure you clone any Arcs needed so the closure owns them. Update any
callsites in server.rs to await these methods. This mirrors the existing pattern
used by write/resize/kill to avoid blocking Tokio workers.
In `@crates/embers-server/src/config.rs`:
- Around line 23-27: The derived runtime_dir created from socket_path
(runtime_dir = socket_path.with_extension("runtimes")) can produce keeper socket
paths that exceed OS domain-socket length limits when you later build keeper
sockets (see server code that does runtime_dir.join(format!("buffer-{}.sock",
buffer_id.0))). Update the code to validate the full path length before
creating/using keeper sockets: compute the final candidate path (runtime_dir
joined with the buffer file name), check it against the platform socket path
limit (e.g., 104 bytes on macOS, 108 on Linux), and if it would exceed the limit
either shorten the on-disk name (e.g., hash/truncate buffer_id or use a stable
shortened prefix) or return a clear configuration error when creating
runtime_dir/socket_path so callers reserve enough space; implement this
validation where runtime_dir is used to construct keeper sockets (reference
runtime_dir, socket_path, and the keeper socket construction in server.rs).
In `@crates/embers-server/src/server.rs`:
- Around line 455-487: The loop restores each buffer by calling
attach_buffer_runtime(buffer.id, socket_path) which can hang because
connect_to_keeper() retries for the full budget; to fix, short-circuit
non-existent sockets or use a shorter restore timeout: before calling
attach_buffer_runtime check the socket file exists (e.g., tokio::fs::metadata or
Path::exists on buffer.runtime_socket_path) and if missing call
state.mark_buffer_interrupted(buffer.id, buffer_pid_hint(&buffer.state)) and
continue, or add an optional timeout/fast-path parameter to
attach_buffer_runtime (and then to connect_to_keeper in buffer_runtime.rs) so
the restore path uses a small timeout instead of the full retry budget.
In `@crates/embers-server/src/state.rs`:
- Around line 400-405: mark_buffer_interrupted currently unconditionally
overwrites a buffer's state to BufferState::Interrupted, which loses existing
BufferState::Exited metadata used by restore_buffer_runtimes(); change
mark_buffer_interrupted to first inspect the current buffer.state and if it is
BufferState::Exited (preserving its exit metadata) do nothing and return Ok(()),
otherwise set buffer.state = BufferState::Interrupted(InterruptedBuffer {
last_known_pid: pid }) as before; reference the mark_buffer_interrupted function
and the BufferState::Exited / BufferState::Interrupted variants to locate where
to add the guard.
In `@crates/embers-server/tests/persistence.rs`:
- Around line 219-253: Replace the single tokio::time::sleep + one-shot calls to
request_snapshot with a retry/poll loop that repeatedly sends
BufferRequest::Capture (using RequestId(9)/RequestId(10) and the
attached_id/detached_id) and checks the returned snapshot.lines for the expected
"restarted-attached" / "restarted-detached" strings until either the line
appears or a deadline expires; remove the fixed 200ms sleep, backoff briefly
between attempts, and fail only after the timeout so transient non-snapshot
responses from request_snapshot don't cause intermittent panics.
---
Outside diff comments:
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 311-325: In join_threads_blocking (impl BufferRuntimeInner) don’t
drop threads.poller (which detaches); instead take the JoinHandle from
threads.poller, detect if it is the current thread (use a self-join guard /
thread::current().id()) and if not call join() on the handle to wait for the
poller to finish, otherwise detach/skip to avoid deadlock; ensure
stop.store(true, ...) happens before joining so the poller can exit and handle
poisoning the threads mutex as currently done when retrieving threads.poller.
In `@crates/embers-server/src/server.rs`:
- Around line 1715-1738: The code removes the runtime handle unconditionally in
record_buffer_exit by calling
self.buffer_runtimes.lock().await.remove(&buffer_id), which drops the keeper and
prevents capture_*/visible-snapshot paths from serving final render; change the
logic so the keeper handle is not removed on a normal exit: fetch (e.g.,
get/clone or peek) the runtime handle instead of removing it when
should_interrupt is false so the keeper remains available for buffer_runtime()
to serve Exited state, and only remove the entry when you truly intend to
discard the runtime (for example when should_interrupt is true or after final
cleanup). Ensure you update record_buffer_exit to conditionally remove from
buffer_runtimes (referencing buffer_runtimes and record_buffer_exit) and
preserve the keeper used by buffer_runtime()/Exited in buffer_runtime.rs.
- Around line 253-257: Remove the unused server-side surface cache by deleting
the Server struct field buffer_surfaces and the BufferSurface type and any
associated imports/usages; search for symbols buffer_surfaces and BufferSurface
(and matches in impls/constructors, e.g., where Server is created or mutated)
and remove or refactor those code paths to rely on the keeper-backed rendering
flow instead (cleanup any Mutex/BTreeMap/BufferId references tied only to this
cache and remove tests or dead code referencing these symbols).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 577c7183-c1a6-44fc-b0c4-5a2f9b38d910
📒 Files selected for processing (14)
crates/embers-cli/src/lib.rscrates/embers-cli/tests/interactive.rscrates/embers-cli/tests/panes.rscrates/embers-core/src/metadata.rscrates/embers-core/src/snapshot.rscrates/embers-server/src/buffer_runtime.rscrates/embers-server/src/config.rscrates/embers-server/src/lib.rscrates/embers-server/src/model.rscrates/embers-server/src/persist.rscrates/embers-server/src/server.rscrates/embers-server/src/state.rscrates/embers-server/tests/persistence.rscrates/embers-test-support/tests/buffer_runtime.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/embers-server/src/server.rs (1)
1460-1499:⚠️ Potential issue | 🔴 CriticalDon't force
Runningbefore applying the sampled keeper status.This races with
on_exitfor short-lived commands. If the callback marks the buffer exited first,mark_buffer_running()fails here andBufferRequest::Createdeletes the buffer as if startup failed.Suggested fix
{ let mut state = self.state.lock().await; - if let Err(error) = state.mark_buffer_running(buffer_id, status.pid) { - let _ = runtime.kill().await; - let _ = runtime.join_threads().await; - return Err(error); - } state.set_buffer_runtime_socket_path( buffer_id, Some(runtime.socket_path().to_path_buf()), )?; apply_runtime_status(&mut state, buffer_id, &status);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/server.rs` around lines 1460 - 1499, The code currently calls state.mark_buffer_running(...) before applying the sampled runtime status, which races with the on_exit callback; fix by applying the sampled status first via apply_runtime_status(&mut state, buffer_id, &status), then only call mark_buffer_running(buffer_id, status.pid) if the buffer state is still non-terminal (i.e. not already marked exited by apply_runtime_status); if apply_runtime_status indicates the buffer is already exited, perform the same cleanup (runtime.kill().await and runtime.join_threads().await) and return the error/early-exit instead of forcing Running. Reference BufferRuntimeHandle::spawn, status, apply_runtime_status, and mark_buffer_running to locate and reorder/add the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/embers-cli/src/lib.rs`:
- Around line 73-87: RuntimeKeeper currently redefines the global --socket
option causing a clap clash; remove the duplicate socket field and its
#[arg(long)] from the RuntimeKeeper variant and update any locations that
referenced RuntimeKeeper.socket to use the root Cli.socket (or accept the socket
via function parameters from the parsed Cli) instead; also remove/rename any
other duplicate socket definitions (the other RuntimeKeeper-like variant noted
in the review) so only the Cli struct defines #[arg(long, global = true)]
socket.
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 765-770: When connection.status() returns Err in the poll loop,
don't just log and sleep/continue; treat it as terminal and invoke the on_exit
callback with None (call on_exit(None)) for the associated buffer
(inner.buffer_id) and then break/return out of the poll loop so the thread stops
instead of spinning with STATUS_POLL_INTERVAL; update the block handling
Err(error) in the poller where debug!(%error, %inner.buffer_id, "status poll
failed") is logged to call on_exit(None) and stop the loop so the buffer can
exit Running state.
- Around line 899-910: The read_message function currently allocates payload
based on the peer-supplied len (u32), which can cause OOM; add a bounds check on
len before creating payload (e.g., compare len against a defined MAX_FRAME_SIZE
constant like MAX_IPC_FRAME_SIZE or a sensible hard cap such as a few MBs), and
if len exceeds the limit return an appropriate Err (wrap as MuxError::internal
or a new MuxError variant) instead of allocating; update the code around the len
variable and the payload allocation in read_message to validate and reject
oversized frames early.
- Around line 822-858: resolve_runtime_keeper_executable() currently only finds
a Cargo-built binary (or uses EMBERS_RUNTIME_KEEPER_BIN) which causes
spawn_runtime_keeper() to fall back to the in-process thread for installed
deployments; update resolve_runtime_keeper_executable() to also search for an
installed runtime keeper binary by (1) checking
std::env::var("EMBERS_RUNTIME_KEEPER_BIN") first (keep existing behavior), (2)
looking in PATH for a well-known binary name such as "embers-runtime-keeper"
(use which-style lookup via env::var_os("PATH") and iterate dirs), and (3) if
that fails, checking next to the current executable
(std::env::current_exe().parent().join("embers-runtime-keeper")) before falling
back to the Cargo-specific logic; ensure the function returns an executable
PathBuf only if it is exists() and is_file() (and on Unix is executable), so
spawn_runtime_keeper() can launch the external process for installed embers too.
- Around line 557-565: The accept loop in run_runtime_keeper never observes the
buffer/PTY exit so terminate never becomes true (handle_keeper_request always
returns false), causing accept() to block forever; modify the loop in
run_runtime_keeper that currently calls listener.accept() and
handle_keeper_client(runtime.clone(), &mut stream) to also check the
KeeperRuntime::exit_code (or a provided is_exited() method) before/after accept
and break when exit_code.is_some(), or use a timed accept/select with a timeout
to re-check exit_code periodically; ensure keeper_wait_loop updates the same
exit flag (exit_code) and that breaking the loop allows reader_join and
wait_join to join and triggers SocketCleanup.
In `@crates/embers-server/src/server.rs`:
- Around line 1506-1534: attach_buffer_runtime's BufferRuntimeCallbacks
(on_output/on_exit) spawn tasks directly and thus bypass the quiesce_state_tasks
TaskCounter; update attach_buffer_runtime to mirror spawn_buffer_runtime by
entering the shared TaskCounter before spawning: call the TaskCounter::enter
(via self.state_tasks.enter() or equivalent) at the start of each callback,
capture the returned guard and move it into the spawned async closure so the
guard is held for the duration of the spawned task, then drop the guard after
record_buffer_update/record_buffer_exit complete; ensure you reference
BufferRuntimeCallbacks, on_output, on_exit, attach_buffer_runtime,
record_buffer_update, record_buffer_exit, and quiesce_state_tasks when making
the change so restored-runtime callbacks are counted.
---
Outside diff comments:
In `@crates/embers-server/src/server.rs`:
- Around line 1460-1499: The code currently calls state.mark_buffer_running(...)
before applying the sampled runtime status, which races with the on_exit
callback; fix by applying the sampled status first via apply_runtime_status(&mut
state, buffer_id, &status), then only call mark_buffer_running(buffer_id,
status.pid) if the buffer state is still non-terminal (i.e. not already marked
exited by apply_runtime_status); if apply_runtime_status indicates the buffer is
already exited, perform the same cleanup (runtime.kill().await and
runtime.join_threads().await) and return the error/early-exit instead of forcing
Running. Reference BufferRuntimeHandle::spawn, status, apply_runtime_status, and
mark_buffer_running to locate and reorder/add the check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5062b2c9-e792-4728-ac37-09955943099c
📒 Files selected for processing (14)
crates/embers-cli/src/lib.rscrates/embers-cli/tests/interactive.rscrates/embers-cli/tests/panes.rscrates/embers-core/src/metadata.rscrates/embers-core/src/snapshot.rscrates/embers-server/src/buffer_runtime.rscrates/embers-server/src/config.rscrates/embers-server/src/lib.rscrates/embers-server/src/model.rscrates/embers-server/src/persist.rscrates/embers-server/src/server.rscrates/embers-server/src/state.rscrates/embers-server/tests/persistence.rscrates/embers-test-support/tests/buffer_runtime.rs
35a08ac to
5ab4920
Compare
237f71a to
f311c96
Compare
5aace75 to
9c359b4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
crates/embers-cli/src/lib.rs (1)
74-88: 🧹 Nitpick | 🔵 TrivialConsider renaming
RuntimeKeeper.sockettokeeper_socketfor clarity.The
RuntimeKeepersubcommand redefines a--socketargument that is already defined globally on the rootClistruct. While clap 4.5 allows this and the code works correctly, the shared name obscures that they serve different purposes: the global socket targets the server connection, while RuntimeKeeper's socket is a subprocess-specific argument.Renaming to
--keeper-socketimproves clarity and prevents accidental confusion:Suggested changes
#[command(name = "__runtime-keeper", hide = true)] RuntimeKeeper { - #[arg(long)] - socket: PathBuf, + #[arg(long = "keeper-socket")] + keeper_socket: PathBuf,Then update
crates/embers-cli/src/lib.rshandler:Some(Command::RuntimeKeeper { - socket, + keeper_socket, cols, rows, cwd, env, command, }) => embers_server::run_runtime_keeper(embers_server::RuntimeKeeperCli { - socket_path: socket, + socket_path: keeper_socket,And update
crates/embers-server/src/buffer_runtime.rs:- .arg("--socket") + .arg("--keeper-socket")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-cli/src/lib.rs` around lines 74 - 88, Rename the RuntimeKeeper subcommand's socket field to keeper_socket (and its CLI long flag to --keeper-socket) to avoid collision with the global socket option; update the RuntimeKeeper variant in the Cli enum in crates/embers-cli/src/lib.rs (the RuntimeKeeper struct fields) and adjust any places that construct or destructure RuntimeKeeper (handlers that read RuntimeKeeper.socket) to use RuntimeKeeper.keeper_socket, and update crates/embers-server/src/buffer_runtime.rs where the RuntimeKeeper socket value is consumed (references to RuntimeKeeper.socket) to reference the new keeper_socket name so all callsites and parsing remain consistent.crates/embers-server/src/buffer_runtime.rs (4)
765-770:⚠️ Potential issue | 🟠 MajorTreat a broken keeper connection as terminal.
Once
connection.status()starts failing, this loop just retries the same dead stream forever.on_exitnever fires, so the buffer can remain stuck inRunningand the poller burns a thread until shutdown.Suggested fix
match connection.status() { Ok(status) => status, Err(error) => { debug!(%error, %inner.buffer_id, "status poll failed"); - thread::sleep(STATUS_POLL_INTERVAL); - continue; + if !saw_exit { + saw_exit = true; + (callbacks.on_exit)(inner.buffer_id, None); + } + break; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 765 - 770, The status polling loop currently retries forever when connection.status() returns Err, causing a stuck Running state; change the Err branch (the match on connection.status()) to treat the keeper connection as terminal: log the error with %error and %inner.buffer_id, call the buffer cleanup/exit path (invoke on_exit or otherwise transition the buffer to a terminal state), stop sleeping/retrying and break/return from the loop so the poller thread exits; ensure STATUS_POLL_INTERVAL is not used to retry in this error path and that any needed cleanup for the connection is performed before returning.
899-910:⚠️ Potential issue | 🟠 MajorBound keeper frame sizes before allocating.
lenis peer-controlled. A bogus local client can force arbitrarily large allocations here and OOM the server or keeper.Suggested fix
fn read_message<T: for<'de> Deserialize<'de>>(stream: &mut UnixStream) -> Result<Option<T>> { let mut len_bytes = [0_u8; 4]; match stream.read_exact(&mut len_bytes) { Ok(()) => {} Err(error) if error.kind() == std::io::ErrorKind::UnexpectedEof => return Ok(None), Err(error) => return Err(error.into()), } let len = u32::from_le_bytes(len_bytes) as usize; + if len > 16 * 1024 * 1024 { + return Err(MuxError::protocol(format!( + "runtime keeper payload too large: {len} bytes" + ))); + } let mut payload = vec![0_u8; len]; stream.read_exact(&mut payload)?; let value = serde_json::from_slice(&payload).map_err(|error| MuxError::internal(error.to_string()))?; Ok(Some(value))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 899 - 910, In read_message, validate the peer-controlled len before allocating payload to avoid OOM: introduce a MAX_FRAME_SIZE constant and check the parsed len (u32::from_le_bytes(len_bytes) as usize) against it and also ensure it’s non-zero/within usize bounds; if len is out of range, return an appropriate MuxError instead of allocating; only after this guarded check create the payload Vec and proceed with stream.read_exact and serde_json::from_slice. Ensure the check references the len variable and the payload allocation site so the allocation is never attempted for an oversized frame.
803-819:⚠️ Potential issue | 🟠 MajorRestore can still spend 30s per stale socket inode.
server.rsnow skips missing paths, butconnect_to_keeper()still retriesConnectionRefusedfor the full 30s budget. If a crashed keeper leaves its socket file behind, restart stalls buffer-by-buffer anyway.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 803 - 819, connect_to_keeper currently sleeps and retries on std::io::ErrorKind::ConnectionRefused which causes long stalls when a stale socket inode exists; change the match arm for ErrorKind::ConnectionRefused in connect_to_keeper to return the error immediately (Err(error.into())) instead of sleeping and retrying (leave the NotFound branch to keep retry behavior). Keep the existing retry loop and timeout behavior (CONNECT_RETRY_ATTEMPTS, CONNECT_RETRY_DELAY, and the final MuxError::timeout) but fail fast on ConnectionRefused to avoid per-buffer 30s stalls.
822-858:⚠️ Potential issue | 🔴 CriticalInstalled builds still miss the external keeper path.
If
current_exe()is a normal installedembersbinary, this returnsNoneandspawn_runtime_keeper()falls back to the in-process thread. PTYs then die with the server process, so restart survival only works under Cargo-ish launches or withEMBERS_RUNTIME_KEEPER_BIN.Suggested fix
fn resolve_runtime_keeper_executable() -> Option<PathBuf> { if let Some(path) = std::env::var_os("EMBERS_RUNTIME_KEEPER_BIN").map(PathBuf::from) && path.is_file() { return Some(path); @@ } let current_exe = std::env::current_exe().ok()?; + let expected_name = if cfg!(windows) { "embers.exe" } else { "embers" }; + if current_exe.file_name().is_some_and(|name| name == expected_name) { + return Some(current_exe); + } if let Some(parent) = current_exe.parent() && parent.file_name().is_some_and(|name| name == "deps") { let candidate = parent.parent()?.join(if cfg!(windows) { "embers.exe"Also applies to: 861-886
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 822 - 858, The current spawn_runtime_keeper falls back to an in-process thread when resolve_runtime_keeper_executable() returns None, causing installed builds to never launch the external keeper; change the logic to attempt alternative resolution before falling back: call env::var("EMBERS_RUNTIME_KEEPER_BIN") and, if unset, attempt to locate a sibling/packaged binary name (e.g., "embers-runtime-keeper") on PATH (use which::which or equivalent) inside spawn_runtime_keeper (or enhance resolve_runtime_keeper_executable to perform these checks), and only when all external-lookups fail then spawn the in-process thread via run_runtime_keeper with RuntimeKeeperCli; keep the existing arg/env handling and error mapping unchanged.crates/embers-server/src/server.rs (1)
1604-1632:⚠️ Potential issue | 🟠 MajorTrack restored-runtime callbacks in
state_tasks.
attach_buffer_runtime()spawnsrecord_buffer_update()/record_buffer_exit()work without enteringself.state_tasks.quiesce_state_tasks()can therefore return before restored-runtime state mutations finish, and shutdown may persist stale buffer state.Suggested fix
let output_handle = tokio::runtime::Handle::current(); let exit_handle = output_handle.clone(); let output_runtime = self.clone(); let exit_runtime = self.clone(); + let output_tasks = self.state_tasks.clone(); + let exit_tasks = self.state_tasks.clone(); let runtime = BufferRuntimeHandle::attach( buffer_id, socket_path, BufferRuntimeCallbacks { on_output: Arc::new(move |buffer_id, update| { let runtime = output_runtime.clone(); + let _task = output_tasks.enter(); std::mem::drop(output_handle.spawn(async move { + let _task = _task; runtime.record_buffer_update(buffer_id, update).await; })); }), on_exit: Arc::new(move |buffer_id, exit_code| { let runtime = exit_runtime.clone(); + let _task = exit_tasks.enter(); std::mem::drop(exit_handle.spawn(async move { + let _task = _task; runtime.record_buffer_exit(buffer_id, exit_code).await; })); }), },
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 568-575: The loop in handle_keeper_client uses the ? operator on
read_message and handle_keeper_request which causes run_runtime_keeper to exit
on ordinary request errors; instead, catch errors from
read_message::<KeeperRequest> and handle_keeper_request, convert them into a
KeeperResponse::Error (including error details), and send that response via
write_message rather than returning Err; keep using the terminate boolean only
for fatal control-flow from successful handle_keeper_request results, and if
write_message itself fails treat it as a socket/connection termination (return
Ok(false)). Ensure you update the error handling around read_message,
handle_keeper_request, and write_message in handle_keeper_client to build and
write a KeeperResponse::Error (referencing KeeperResponse::Error, KeeperRequest,
read_message, handle_keeper_request, write_message, and handle_keeper_client).
In `@crates/embers-server/src/server.rs`:
- Around line 1739-1760: The record_buffer_update method currently applies
runtime updates unconditionally which can allow out-of-order async work to roll
back buffer.last_snapshot_seq and re-broadcast stale title/activity; change
record_buffer_update to compare update.sequence against the existing
buffer.last_snapshot_seq (and only apply and set
title/activity/last_snapshot_seq when update.sequence is greater than the stored
sequence), and skip the broadcast when the incoming update is stale (sequence <=
last_snapshot_seq); reference the BufferRuntimeUpdate.sequence,
buffer.last_snapshot_seq, and the record_buffer_update function to locate the
change.
---
Duplicate comments:
In `@crates/embers-cli/src/lib.rs`:
- Around line 74-88: Rename the RuntimeKeeper subcommand's socket field to
keeper_socket (and its CLI long flag to --keeper-socket) to avoid collision with
the global socket option; update the RuntimeKeeper variant in the Cli enum in
crates/embers-cli/src/lib.rs (the RuntimeKeeper struct fields) and adjust any
places that construct or destructure RuntimeKeeper (handlers that read
RuntimeKeeper.socket) to use RuntimeKeeper.keeper_socket, and update
crates/embers-server/src/buffer_runtime.rs where the RuntimeKeeper socket value
is consumed (references to RuntimeKeeper.socket) to reference the new
keeper_socket name so all callsites and parsing remain consistent.
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 765-770: The status polling loop currently retries forever when
connection.status() returns Err, causing a stuck Running state; change the Err
branch (the match on connection.status()) to treat the keeper connection as
terminal: log the error with %error and %inner.buffer_id, call the buffer
cleanup/exit path (invoke on_exit or otherwise transition the buffer to a
terminal state), stop sleeping/retrying and break/return from the loop so the
poller thread exits; ensure STATUS_POLL_INTERVAL is not used to retry in this
error path and that any needed cleanup for the connection is performed before
returning.
- Around line 899-910: In read_message, validate the peer-controlled len before
allocating payload to avoid OOM: introduce a MAX_FRAME_SIZE constant and check
the parsed len (u32::from_le_bytes(len_bytes) as usize) against it and also
ensure it’s non-zero/within usize bounds; if len is out of range, return an
appropriate MuxError instead of allocating; only after this guarded check create
the payload Vec and proceed with stream.read_exact and serde_json::from_slice.
Ensure the check references the len variable and the payload allocation site so
the allocation is never attempted for an oversized frame.
- Around line 803-819: connect_to_keeper currently sleeps and retries on
std::io::ErrorKind::ConnectionRefused which causes long stalls when a stale
socket inode exists; change the match arm for ErrorKind::ConnectionRefused in
connect_to_keeper to return the error immediately (Err(error.into())) instead of
sleeping and retrying (leave the NotFound branch to keep retry behavior). Keep
the existing retry loop and timeout behavior (CONNECT_RETRY_ATTEMPTS,
CONNECT_RETRY_DELAY, and the final MuxError::timeout) but fail fast on
ConnectionRefused to avoid per-buffer 30s stalls.
- Around line 822-858: The current spawn_runtime_keeper falls back to an
in-process thread when resolve_runtime_keeper_executable() returns None, causing
installed builds to never launch the external keeper; change the logic to
attempt alternative resolution before falling back: call
env::var("EMBERS_RUNTIME_KEEPER_BIN") and, if unset, attempt to locate a
sibling/packaged binary name (e.g., "embers-runtime-keeper") on PATH (use
which::which or equivalent) inside spawn_runtime_keeper (or enhance
resolve_runtime_keeper_executable to perform these checks), and only when all
external-lookups fail then spawn the in-process thread via run_runtime_keeper
with RuntimeKeeperCli; keep the existing arg/env handling and error mapping
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a411efbc-93e0-40f5-8cfb-f0848d0c1b4c
📒 Files selected for processing (14)
crates/embers-cli/src/lib.rscrates/embers-cli/tests/interactive.rscrates/embers-cli/tests/panes.rscrates/embers-core/src/metadata.rscrates/embers-core/src/snapshot.rscrates/embers-server/src/buffer_runtime.rscrates/embers-server/src/config.rscrates/embers-server/src/lib.rscrates/embers-server/src/model.rscrates/embers-server/src/persist.rscrates/embers-server/src/server.rscrates/embers-server/src/state.rscrates/embers-server/tests/persistence.rscrates/embers-test-support/tests/buffer_runtime.rs
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/embers-server/src/model.rs (1)
21-33: 🧹 Nitpick | 🔵 TrivialAvoid leaking the keeper socket path into the public
Buffersurface.
Bufferis re-exported from the crate root, soruntime_socket_pathturns a backend filesystem detail into public API and into the struct’s derivedDebug/Eqbehavior. If outside callers do not need this, keep it in server-internal state/persistence types instead of the public model.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/model.rs` around lines 21 - 33, The public Buffer struct currently exposes runtime_socket_path which leaks a backend filesystem detail; remove the pub runtime_socket_path: Option<PathBuf> field from the public Buffer definition and move that data into an internal/server-only type used for persistence (e.g., a separate struct like InternalBufferState or PersistedBuffer) or make the field private within the module if it must remain on the same type; update all builders/constructors and uses that previously read/write Buffer::runtime_socket_path to use the new internal type or accessor functions and adjust any Debug/Eq/Serialize derivations accordingly so external callers no longer see the keeper socket path.crates/embers-server/src/server.rs (1)
54-65:⚠️ Potential issue | 🟠 MajorBind the public socket after runtime restore completes.
UnixListener::bind()happens beforeRuntime::restore_buffer_runtimes(), whilecrates/embers-cli/src/lib.rstreats a successful connect as readiness. With restore now doing real work, clients can connect and send the first request while no accept loop is running yet, so startup commands can stall behind restore. Move the bind/permission setup after restore, or make the readiness probe perform a real request.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/server.rs` around lines 54 - 65, The UnixListener is currently bound before runtime.restore_buffer_runtimes() which allows clients to connect before the accept loop is ready; move the bind and permission steps after the runtime is constructed and after runtime.restore_buffer_runtimes(). Specifically: keep load_workspace(...) and Runtime::new(...) as-is, call runtime.restore_buffer_runtimes().await? first, then call UnixListener::bind(&self.config.socket_path)? and set_socket_permissions(&self.config.socket_path)?; this ensures the socket is created only after restore completes (alternatively, if you prefer readiness semantics, implement a real readiness probe that performs a real request instead of relying on successful bind).
♻️ Duplicate comments (3)
crates/embers-server/src/server.rs (1)
1604-1629:⚠️ Potential issue | 🟠 MajorCount restored-runtime callbacks in
state_tasks.
attach_buffer_runtime()spawnsrecord_buffer_update()andrecord_buffer_exit()without taking aTaskCounterticket.quiesce_state_tasks()can therefore persist the workspace before callbacks from restored keepers finish during shutdown. Mirror the guard logic fromspawn_buffer_runtime().Suggested fix
async fn attach_buffer_runtime( self: &Arc<Self>, buffer_id: BufferId, socket_path: PathBuf, ) -> Result<(BufferRuntimeHandle, BufferRuntimeStatus)> { let output_handle = tokio::runtime::Handle::current(); let exit_handle = output_handle.clone(); let output_runtime = self.clone(); let exit_runtime = self.clone(); + let output_tasks = self.state_tasks.clone(); + let exit_tasks = self.state_tasks.clone(); let runtime = BufferRuntimeHandle::attach( buffer_id, socket_path, BufferRuntimeCallbacks { on_output: Arc::new(move |buffer_id, update| { let runtime = output_runtime.clone(); + let _task = output_tasks.enter(); std::mem::drop(output_handle.spawn(async move { + let _task = _task; runtime.record_buffer_update(buffer_id, update).await; })); }), on_exit: Arc::new(move |buffer_id, exit_code| { let runtime = exit_runtime.clone(); + let _task = exit_tasks.enter(); std::mem::drop(exit_handle.spawn(async move { + let _task = _task; runtime.record_buffer_exit(buffer_id, exit_code).await; })); }), }, )?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/server.rs` around lines 1604 - 1629, attach_buffer_runtime currently spawns the on_output and on_exit async tasks without acquiring a TaskCounter ticket, so restored-runtime callbacks can run after quiesce_state_tasks persists state; modify attach_buffer_runtime to mirror spawn_buffer_runtime by obtaining a TaskCounter guard (or calling the same helper) before spawning each task inside the on_output and on_exit closures, hold the guard for the lifetime of the spawned task, and pass/clone the guard into the async move block so record_buffer_update and record_buffer_exit run under the TaskCounter ticket.crates/embers-server/src/buffer_runtime.rs (2)
190-200:⚠️ Potential issue | 🟠 MajorMove the keeper attach handshake off the async worker.
attach()still does blockingconnect_to_keeper()retries plus a synchronousconnection.status()round-trip.crates/embers-server/src/server.rs:1540-1590andcrates/embers-server/src/server.rs:1601-1632call this path directly from async code, so a slow or missing keeper can pin a Tokio worker for ~30s during spawn/restore even though the later RPCs already usespawn_blocking.To confirm the async/blocking call chain:
#!/bin/bash set -euo pipefail echo "Async call sites:" rg -n -C3 'BufferRuntimeHandle::(spawn|attach)\(' crates/embers-server/src/server.rs echo echo "Synchronous attach handshake:" rg -n -C3 'connect_to_keeper\(|connection\.status\(\)' crates/embers-server/src/buffer_runtime.rs echo echo "Blocking retry loop:" rg -n -C3 'CONNECT_RETRY_(DELAY|ATTEMPTS)|thread::sleep\(CONNECT_RETRY_DELAY\)' crates/embers-server/src/buffer_runtime.rsAlso applies to: 829-845
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 190 - 200, The attach() function currently performs blocking connect_to_keeper() and synchronous KeeperConnection::status() calls and is invoked from async contexts; move that blocking handshake off the async worker by performing the entire connect+status retry loop inside a blocking task (e.g., tokio::task::spawn_blocking) or expose an async attach_async wrapper that calls spawn_blocking for the existing connect_to_keeper/KeeperConnection::status logic; update call sites that call BufferRuntime::attach (or BufferRuntimeHandle::attach) from async code to await the new async wrapper or to call the nonblocking API so no Tokio worker is blocked by the retry loop. Ensure unique symbols to change: attach, connect_to_keeper, KeeperConnection::status, and callers in server.rs that call BufferRuntimeHandle::attach/spawn/restore.
561-565:⚠️ Potential issue | 🟠 MajorExited PTYs still leave a keeper behind.
keeper_wait_loop()records the exit code, but the accept loop never observes it andaccept()blocks forever. After the shell exits, the helper process/thread and socket file stay around indefinitely until something else kills the keeper.Also applies to: 765-769
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/embers-server/src/buffer_runtime.rs` around lines 561 - 565, The accept loop blocks on listener.accept() so it never sees the exit recorded by keeper_wait_loop(); make the loop nonblocking and poll the termination signal instead: set the listener to nonblocking (or use a timeouted accept) and change the termination flag to a shared AtomicBool or receive on a channel that keeper_wait_loop sets/sends to; then in the while !terminate loop call listener.accept(), handle WouldBlock by sleeping/yielding briefly, and still call handle_keeper_client(...) when accept returns a stream so the loop can exit promptly when keeper_wait_loop signals termination. Ensure you update references to terminate, listener.accept, handle_keeper_client and keeper_wait_loop accordingly so the accept loop observes the exit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/embers-cli/src/lib.rs`:
- Around line 657-673: The Command::RuntimeKeeper branch is being forced through
run()'s global socket resolution/validation
(resolve_socket_path()/validate_runtime_socket_parent()) even though it uses its
own keeper_socket; move the match arm that handles Command::RuntimeKeeper (the
branch that calls embers_server::run_runtime_keeper with RuntimeKeeperCli) to
execute before calling resolve_socket_path()/validate_runtime_socket_parent(),
or alternatively add a conditional to skip those validations when the matched
variant is Command::RuntimeKeeper so that keeper startup uses its provided
keeper_socket without being blocked by the global socket checks.
In `@crates/embers-cli/tests/interactive.rs`:
- Around line 21-24: The test_lock() function currently returns a process-local
static LOCK (OnceLock<Mutex<()>>) which does not serialize across the separate
test binaries; replace this per-process static with an interprocess coordination
mechanism (e.g., a file-based lock using a crate like fs2/file-lock or explicit
flock on a well-known temp file) and update test_lock (and any callers in
interactive.rs, panes.rs, and buffer_runtime.rs) to acquire the interprocess
lock before proceeding and release it when done so tests across binaries
serialize; ensure the new implementation uses a unique, shared lock name/path
and returns an appropriate guard or RAII object so tests still block safely
until the lock is available.
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 31-35: The BufferRuntimeUpdate struct's title field currently uses
Option<String>, which conflates "no update" with "explicitly cleared" and causes
stale titles; change BufferRuntimeUpdate::title to Option<Option<String>> so
outer None = no change, Some(None) = cleared, Some(Some(s)) = set. Update any
producers (e.g., spawn_status_poller()) to emit Some(None) when clearing the
title and update the consumer logic in record_buffer_update() to handle the
three cases (outer None => do nothing, Some(None) => clear stored title,
Some(Some(s)) => set title to s); reference BufferRuntimeUpdate,
spawn_status_poller, and record_buffer_update when making the changes.
- Around line 862-865: The code in buffer_runtime.rs currently passes
environment values to the external keeper using value.to_string_lossy(), losing
non-UTF-8 bytes; update the CLI encoding to use a lossless encoding (e.g.,
base64) for --env so the external keeper and the in-process fallback use the
same representation. Change the loop that builds keeper.args (the cli.env
iteration and keeper.arg(...) that uses to_string_lossy) to encode
value.as_bytes() with base64 (or JSON-escape) and format the argument as
"KEY=ENCODED" plus a marker like "base64:" so the keeper can detect and decode
it; also update the keeper's argument-parsing code and the in-process fallback
deserialization to decode the same marker back into the original bytes and
reconstruct the same env variable (preserving non-UTF-8) instead of relying on
to_string_lossy().
In `@crates/embers-server/src/server.rs`:
- Around line 424-456: When reattachment fails (both the "socket missing" branch
and the Err branch from attach_buffer_runtime), the code currently marks the
buffer interrupted but leaves the stale runtime path in state; update both
failure paths to clear the stored runtime path by calling the state mutation
that removes the socket (use the same function used to set it, e.g.,
set_buffer_runtime_socket_path(buffer.id, None) or equivalent) before continuing
so the dead path is not re-persisted; locate these changes around
attach_buffer_runtime, buffer_pid_hint, and state.mark_buffer_interrupted to
insert the clear operation.
---
Outside diff comments:
In `@crates/embers-server/src/model.rs`:
- Around line 21-33: The public Buffer struct currently exposes
runtime_socket_path which leaks a backend filesystem detail; remove the pub
runtime_socket_path: Option<PathBuf> field from the public Buffer definition and
move that data into an internal/server-only type used for persistence (e.g., a
separate struct like InternalBufferState or PersistedBuffer) or make the field
private within the module if it must remain on the same type; update all
builders/constructors and uses that previously read/write
Buffer::runtime_socket_path to use the new internal type or accessor functions
and adjust any Debug/Eq/Serialize derivations accordingly so external callers no
longer see the keeper socket path.
In `@crates/embers-server/src/server.rs`:
- Around line 54-65: The UnixListener is currently bound before
runtime.restore_buffer_runtimes() which allows clients to connect before the
accept loop is ready; move the bind and permission steps after the runtime is
constructed and after runtime.restore_buffer_runtimes(). Specifically: keep
load_workspace(...) and Runtime::new(...) as-is, call
runtime.restore_buffer_runtimes().await? first, then call
UnixListener::bind(&self.config.socket_path)? and
set_socket_permissions(&self.config.socket_path)?; this ensures the socket is
created only after restore completes (alternatively, if you prefer readiness
semantics, implement a real readiness probe that performs a real request instead
of relying on successful bind).
---
Duplicate comments:
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 190-200: The attach() function currently performs blocking
connect_to_keeper() and synchronous KeeperConnection::status() calls and is
invoked from async contexts; move that blocking handshake off the async worker
by performing the entire connect+status retry loop inside a blocking task (e.g.,
tokio::task::spawn_blocking) or expose an async attach_async wrapper that calls
spawn_blocking for the existing connect_to_keeper/KeeperConnection::status
logic; update call sites that call BufferRuntime::attach (or
BufferRuntimeHandle::attach) from async code to await the new async wrapper or
to call the nonblocking API so no Tokio worker is blocked by the retry loop.
Ensure unique symbols to change: attach, connect_to_keeper,
KeeperConnection::status, and callers in server.rs that call
BufferRuntimeHandle::attach/spawn/restore.
- Around line 561-565: The accept loop blocks on listener.accept() so it never
sees the exit recorded by keeper_wait_loop(); make the loop nonblocking and poll
the termination signal instead: set the listener to nonblocking (or use a
timeouted accept) and change the termination flag to a shared AtomicBool or
receive on a channel that keeper_wait_loop sets/sends to; then in the while
!terminate loop call listener.accept(), handle WouldBlock by sleeping/yielding
briefly, and still call handle_keeper_client(...) when accept returns a stream
so the loop can exit promptly when keeper_wait_loop signals termination. Ensure
you update references to terminate, listener.accept, handle_keeper_client and
keeper_wait_loop accordingly so the accept loop observes the exit.
In `@crates/embers-server/src/server.rs`:
- Around line 1604-1629: attach_buffer_runtime currently spawns the on_output
and on_exit async tasks without acquiring a TaskCounter ticket, so
restored-runtime callbacks can run after quiesce_state_tasks persists state;
modify attach_buffer_runtime to mirror spawn_buffer_runtime by obtaining a
TaskCounter guard (or calling the same helper) before spawning each task inside
the on_output and on_exit closures, hold the guard for the lifetime of the
spawned task, and pass/clone the guard into the async move block so
record_buffer_update and record_buffer_exit run under the TaskCounter ticket.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6c99f284-24a0-40c8-95f3-818539c21bd1
📒 Files selected for processing (14)
crates/embers-cli/src/lib.rscrates/embers-cli/tests/interactive.rscrates/embers-cli/tests/panes.rscrates/embers-core/src/metadata.rscrates/embers-core/src/snapshot.rscrates/embers-server/src/buffer_runtime.rscrates/embers-server/src/config.rscrates/embers-server/src/lib.rscrates/embers-server/src/model.rscrates/embers-server/src/persist.rscrates/embers-server/src/server.rscrates/embers-server/src/state.rscrates/embers-server/tests/persistence.rscrates/embers-test-support/tests/buffer_runtime.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/embers-server/src/buffer_runtime.rs`:
- Around line 595-603: The accept loop in the keeper thread never exits because
handle_keeper_client/handle_keeper_request always returns false (so while
!terminate never breaks); update the logic so that a Kill request causes
termination (make handle_keeper_request return true for the Kill branch and
propagate that through handle_keeper_client), or add an exit check inside the
accept loop (inspect the shared exit_code/Runtime state each iteration and break
if Some) and/or use a timeout on listener.accept to re-check exit conditions;
ensure reader_join and wait_join still join after the loop ends.
- Around line 1006-1015: The Unix branch already encodes raw bytes losslessly,
but the Windows branch uses to_string_lossy() which can drop invalid UTF-16;
update encode_runtime_keeper_env_value to produce a lossless encoding on Windows
by converting the OsStr to a raw, reversible byte sequence (for example use the
wtf-8 crate to encode the OsStr to WTF-8 bytes or use
std::os::windows::ffi::OsStrExt to get the &[u16] and serialize as little-endian
bytes) and then base64-encode those bytes; keep the function name
encode_runtime_keeper_env_value and ensure whatever lossless scheme you choose
has a matching decoding change in the keeper CLI.
In `@crates/embers-server/src/server.rs`:
- Around line 2174-2192: Add a brief doc comment above the platform-specific
constants to explain their origin: note that UNIX_SOCKET_PATH_LIMIT corresponds
to the size of sockaddr_un.sun_path on each platform (macOS 104 per unix(4),
other Unix/Linux 108 per unix(7)), so future maintainers understand why these
values are chosen; update the comment near the constants
(UNIX_SOCKET_PATH_LIMIT) and mention validate_keeper_socket_path references them
to validate keeper socket lengths derived from server socket paths.
In `@crates/embers-server/tests/persistence.rs`:
- Around line 238-263: Remove the redundant post-check assertions after calling
wait_for_snapshot_line: the helper wait_for_snapshot_line already ensures the
expected line exists so you should delete the assert! blocks that inspect
attached_capture.lines and detached_capture.lines (references:
wait_for_snapshot_line, attached_capture, detached_capture). Keep the calls that
assign attached_capture and detached_capture and simply rely on the helper's
verification, removing the two assert!(...) blocks and their failure messages.
In `@crates/embers-test-support/src/test_lock.rs`:
- Around line 69-88: The acquire_file_lock function currently busy-waits
indefinitely on non-Unix platforms; modify it to use a timeout: record
Instant::now at function start, loop as before but after each sleep check
elapsed and if it exceeds a configurable timeout (e.g. Duration::from_secs(10))
return an Err or panic with a clear diagnostic instructing the user to remove
the orphaned lock file, including the lock path in the message; reference the
acquire_file_lock function and use std::time::Instant/Duration and the existing
thread::sleep/OpenOptions logic so the only behavioral change is the timeout and
error message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 682d9b14-cf54-41d7-972b-614b0a1d8503
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
crates/embers-cli/Cargo.tomlcrates/embers-cli/src/lib.rscrates/embers-cli/tests/interactive.rscrates/embers-cli/tests/panes.rscrates/embers-core/src/metadata.rscrates/embers-core/src/snapshot.rscrates/embers-server/Cargo.tomlcrates/embers-server/src/buffer_runtime.rscrates/embers-server/src/config.rscrates/embers-server/src/lib.rscrates/embers-server/src/model.rscrates/embers-server/src/persist.rscrates/embers-server/src/server.rscrates/embers-server/src/state.rscrates/embers-server/tests/persistence.rscrates/embers-test-support/Cargo.tomlcrates/embers-test-support/src/lib.rscrates/embers-test-support/src/test_lock.rscrates/embers-test-support/tests/buffer_runtime.rs
| let mut terminate = false; | ||
| while !terminate { | ||
| let (mut stream, _) = listener.accept()?; | ||
| terminate = handle_keeper_client(runtime.clone(), &mut stream)?; | ||
| } | ||
|
|
||
| let _ = reader_join.join(); | ||
| let _ = wait_join.join(); | ||
| Ok(()) |
There was a problem hiding this comment.
Keeper accept loop never terminates after buffer exits.
The while !terminate loop at line 596 only exits when handle_keeper_request returns true, but all request handlers return false (including Kill at line 672). After the buffer process exits:
- The reader/wait threads complete
- The accept loop blocks indefinitely waiting for new connections
- The keeper process/thread never terminates
For external keeper processes, this creates orphaned processes. For in-process threads, resources remain held until the server process exits.
Consider returning true from Kill or adding a timeout/exit-check to the accept loop when exit_code becomes Some.
🔧 Proposed fix: terminate on Kill request
KeeperRequest::Kill => {
runtime.kill()?;
- Ok((KeeperResponse::Ok, false))
+ Ok((KeeperResponse::Ok, true))
}Also applies to: 670-673
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/embers-server/src/buffer_runtime.rs` around lines 595 - 603, The
accept loop in the keeper thread never exits because
handle_keeper_client/handle_keeper_request always returns false (so while
!terminate never breaks); update the logic so that a Kill request causes
termination (make handle_keeper_request return true for the Kill branch and
propagate that through handle_keeper_client), or add an exit check inside the
accept loop (inspect the shared exit_code/Runtime state each iteration and break
if Some) and/or use a timeout on listener.accept to re-check exit conditions;
ensure reader_join and wait_join still join after the loop ends.
Summary by CodeRabbit
New Features
Improvements
Tests
Stack
api-docs Add api docs for rhai config #2(merged)persistance Persist workspace state across clean restarts #3(merged)buffer-cli Add native detached buffer CLI workflows #9(merged)session-rename Add native session rename support #5(merged)