feat: Add request timeout for rpc#399
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a configurable per-RPC request timeout to prevent client calls from hanging indefinitely when the server stops responding (closes #398), and threads the new setting through Rust + Python/C++ bindings.
Changes:
- Introduce
request_timeout_msconfiguration with a 30s default. - Apply request-level timeout in the RPC client and add
RpcError::RequestTimeout. - Expose the new config in Python and C++ bindings.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
crates/fluss/src/rpc/server_connection.rs |
Adds request-timeout handling around response await; includes new unit tests. |
crates/fluss/src/rpc/error.rs |
Adds RequestTimeout RPC error variant. |
crates/fluss/src/config.rs |
Adds request_timeout_ms config with default and debug/default wiring. |
crates/fluss/src/client/connection.rs |
Plumbs request_timeout_ms into RpcClient construction. |
bindings/python/src/config.rs |
Parses and exposes request_timeout_ms in Python binding config. |
bindings/python/fluss/__init__.pyi |
Adds Config.request_timeout_ms typing to Python stubs. |
bindings/cpp/src/lib.rs |
Extends C++ FFI config to include request_timeout_ms. |
bindings/cpp/src/ffi_converter.hpp |
Converts C++ Configuration.request_timeout_ms into FFI config. |
bindings/cpp/include/fluss.hpp |
Exposes request_timeout_ms on the public C++ Configuration struct. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 Ty for the PR, left comments, PTAL
|
I addressed your comments, thanks again. @fresh-borzoni PTAL |
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 Ty, left small comment and also rust docs are missing
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, LGTM
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
can you give another round of reviews? @luoyuxia @fresh-borzoni |
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, LGTM
luoyuxia
left a comment
There was a problem hiding this comment.
@charlesdong1991 Thanks for the pr. Only one minor question. PTAL. Otherwise, LGTM
| Some(active_request) => active_request, | ||
| _ => { | ||
| log::warn!( | ||
| log::debug!( |
There was a problem hiding this comment.
nit: why change to debug?
There was a problem hiding this comment.
i think debug fits better here: because this is a normal late-response race after timeout/cancel, other than an operational warning IMHO
so in production environment, we probably will use metric/counter which probably better than warn spamming. WDYT? @luoyuxia
There was a problem hiding this comment.
Let me know what you think of this, in the meantime, i rebased and resolved conflicts, thanks a lot! @luoyuxia 🙏
There was a problem hiding this comment.
I agree this is a normal late-response race.
However, a request timeout still suggests some underlying issue or risk that users should be aware of. Metrics would help, but logs are often more immediately useful in practice.
Fluss's Java client logs this as warn. I'm not sure whether Kafka client does the same for a similar case, but Kafka client may be a better reference if we want to follow common client behavior.
So I’m slightly leaning toward keeping warn for now.
There was a problem hiding this comment.
thanks for response @luoyuxia
i checked Kafka client, it seems Kafka closes connection on timeout, so IIUC late responses can never arrive, aka, Kafka ignores response for request after timeout.
But it seems it treats as info level:
log.info("Disconnecting from node {} due to request timeout.", nodeId);
so i think here i will switch to warn level then! 🙏
|
@charlesdong1991 Hi, hope it won't bother you. I just found there is already a pr apache/fluss#300 in java side for add request timeout. i'll talk with the author to understand the deisgn in java side to avoid the inconsistent before we merge it. |
|
Of course, no rush! I will keep an eye on that PR too! @luoyuxia |
Purpose
Linked issue: close #398
Brief change log
Tests
Tests all pass locally.
API and Format
Documentation