Conversation
Adds a top-level `databricks doctor` command that validates CLI setup by running sequential diagnostic checks: CLI version, config file readability, active profile, authentication, user identity, and network connectivity. Auth failures are reported as check results, not command errors. Supports both text output (colored status icons) and JSON output (`--output json`). Co-authored-by: Isaac
|
Commit: f406908
16 interesting tests: 7 SKIP, 7 RECOVERED, 2 flaky
Top 20 slowest tests (at least 2 minutes):
|
Co-authored-by: Isaac
The test used env.Set(ctx, ...) to set DATABRICKS_HOST and
DATABRICKS_TOKEN, but checkAuth creates a bare config.Config{}
that reads from real environment variables via os.Getenv, not
the context-based env layer. Use t.Setenv instead so the SDK
can see the values.
Co-authored-by: Isaac
shreyas-goenka
left a comment
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant). Shreyas will do a separate, more thorough review pass.
Priority: HIGH — Config resolution diverges from real CLI auth path
MAJOR: resolveConfig diverges from real CLI auth
The resolveConfig function in databricks doctor constructs its own config resolution path instead of going through the standard SDK/CLI authentication flow. This means the doctor command could report "config is fine" while the real CLI fails (or vice versa). If the goal is to diagnose auth issues, it should use the same code path the CLI uses.
MEDIUM: Network check bypasses SDK HTTP client
The connectivity check uses http.DefaultClient directly instead of going through the SDK's configured HTTP client. In enterprise environments with proxies or custom TLS, this will give misleading results — the check might fail even though the SDK would succeed (or vice versa).
Other Observations
- Good idea for a diagnostic command overall
- The step-by-step output format is user-friendly
- Missing test coverage for the core diagnostic logic
When the workspace client is unavailable but config is resolved, the network check was falling back to http.DefaultClient. This ignores proxy and custom TLS settings from the SDK config, giving misleading results in enterprise environments. Use configuredNetworkHTTPClient(cfg) instead, which respects HTTPTransport and InsecureSkipVerify from the config.
…allback, skip status - Detect account-level configs (AccountID + account host) and use NewAccountClient instead of always using NewWorkspaceClient - Add 15s per-check deadline for auth and identity checks to prevent hangs on unresponsive IdP - Network check now tries even when config resolution fails, as long as a host URL is available from partial config resolution - Identity marked as 'skip' (not 'fail') when auth failed or when using account-level profile, avoiding double failures from one root cause - Add skip status rendering in text output
Why
Users debugging CLI setup issues (auth failures, config problems, network issues) have no single command to diagnose their environment. They must manually run separate commands to check auth, config, and connectivity.
Changes
Before: Users had to manually run separate commands to check auth, config, and connectivity.
Now: A new
databricks doctorcommand runs all diagnostic checks and reports results as a checklist:Text output uses colored status icons ([ok], [FAIL], etc.) to stdout. JSON output (
--output json) returns a structured array. Auth failures are reported as check results, not command errors.Open item
doctorcommand name should be added to a deny list for new API names in the universe API linters, so future auto-generated API commands don't collide with it. Tracked separately.Test plan
make lintfullpassesmake checkspasses