-
Notifications
You must be signed in to change notification settings - Fork 7
Improve logging in Arrow Flight SQL code paths #270
Description
Summary
The Flight SQL data plane has essentially no server-side logging. Infrastructure logging (worker lifecycle, shutdown) is solid, but queries, errors, auth events, and session lifecycle in the Flight path leave no trace.
Critical Gaps
No query execution logging (High)
Neither duckdbservice/flight_handler.go, server/flight_executor.go, nor server/flightsqlingress/ingress.go logs any query at any level. A query that fails, times out, or produces unexpected results leaves no server-side trace — only the gRPC error code is visible on the client.
Affected locations:
flight_handler.go:226-232(DoGetStatement)flight_handler.go:273-287(DoPutCommandStatementUpdate)flight_executor.go:163(QueryContext)flight_executor.go:212(ExecContext)ingress.go:398-430(DoGetStatement)ingress.go:448(DoPutCommandStatementUpdate)
RowsAffected error silently converted to 0 (High — correctness bug)
// flight_handler.go:283-285
affected, err := result.RowsAffected()
if err != nil {
return 0, nil // silently reports 0 rows affected
}A DML statement could mutate rows, fail to report the count, and the client sees UPDATE 0 with no error.
DestroySession RPC failure silently dropped (High)
session_mgr.go:128 — _ = worker.DestroySession(ctx, session.SessionToken) — an RPC call on a 5-second timeout that can fail; failure is never logged, making session cleanup leaks impossible to diagnose.
Medium Priority Gaps
Flight auth success/failure never logged
ingress.go:290-295 — authenticateBasicCredentials returns a gRPC error on failure but never calls slog. Auth success is also silent. Compare: the PG path logs both at control.go:510,527.
Dead worker discovery bypasses crash logging
worker_mgr.go:687-695 — cleanDeadWorkersLocked discovers dead workers during AcquireWorker and calls cleanupDeadWorker in a goroutine, bypassing the slog.Warn("Worker crashed.") path at line 925.
No per-query debug logging (text, latency, row count)
Prometheus counters exist but there's no per-query observability at any log level across the entire Flight SQL data plane.
Low Priority Gaps
- Flight ingress session create/destroy/expiry never logged (
ingress.go) - ~15+
_ = resource.Close()silent drops acrossservice.go,flight_handler.go,session_mgr.go,worker_mgr.go— resource close failures silently swallowed
What's Already Well-Covered
- Worker management (spawn, health check, retire, kill, crash)
- Control plane PG connections (auth, disconnect)
- Shutdown sequences (graceful drain, handover, signals)