Add databricks doctor command by simonfaltum · Pull Request #4730 · databricks/cli

simonfaltum · 2026-03-12T23:40:16Z

Why

Users debugging CLI setup issues (auth failures, config problems, network issues) have no single command to diagnose their environment. They must manually run separate commands to check auth, config, and connectivity.

Changes

Before: Users had to manually run separate commands to check auth, config, and connectivity.
Now: A new databricks doctor command runs all diagnostic checks and reports results as a checklist:

CLI version (info)
Config file readability and profile count (pass/fail)
Active profile (info)
Authentication validity and auth type (pass/fail)
User identity via CurrentUser.Me (pass/fail)
Network connectivity to workspace host (pass/fail)

Text output uses colored status icons ([ok], [FAIL], etc.) to stdout. JSON output (--output json) returns a structured array. Auth failures are reported as check results, not command errors.

Open item

Top-level command deny list: Like the global flags, the doctor command name should be added to a deny list for new API names in the universe API linters, so future auto-generated API commands don't collide with it. Tracked separately.

Test plan

Unit tests for each check function
Unit tests for both text and JSON rendering
Tests for graceful error handling (auth failure, missing config)
make lintfull passes
make checks passes

Adds a top-level `databricks doctor` command that validates CLI setup by running sequential diagnostic checks: CLI version, config file readability, active profile, authentication, user identity, and network connectivity. Auth failures are reported as check results, not command errors. Supports both text output (colored status icons) and JSON output (`--output json`). Co-authored-by: Isaac

eng-dev-ecosystem-bot · 2026-03-12T23:50:26Z

Commit: f406908

Run: 23059775876

	Env	🔄flaky	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
💚	aws linux		8	7	268	787	6:58
💚	aws windows		8	7	270	785	5:12
🔄	aws-ucws linux	2	7	7	364	702	8:42
🔄	aws-ucws windows	2	7	7	366	700	6:55
💚	azure linux		2	9	271	785	5:34
💚	azure windows		2	9	273	783	4:37
🔄	azure-ucws linux	2	1	9	369	698	8:55
🔄	azure-ucws windows	2	1	9	371	696	7:09
💚	gcp linux		2	9	267	788	6:26
💚	gcp windows		2	9	269	786	5:52

16 interesting tests: 7 SKIP, 7 RECOVERED, 2 flaky

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
🔄	TestAccept	💚R	💚R	🔄f	🔄f	💚R	💚R	🔄f	🔄f	💚R	💚R
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🔄	TestAccept/ssh/connect-serverless-gpu	🙈s	🙈s	🔄f	🔄f	🙈s	🙈s	🔄f	🔄f	🙈s	🙈s
💚	TestAccept/ssh/connection	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R

Top 20 slowest tests (at least 2 minutes):

duration	env	testname
4:20	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:10	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:39	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:37	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:19	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:18	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:14	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:11	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:09	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:07	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:52	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:42	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:39	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:37	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:17	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:11	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:09	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform

Co-authored-by: Isaac

The test used env.Set(ctx, ...) to set DATABRICKS_HOST and DATABRICKS_TOKEN, but checkAuth creates a bare config.Config{} that reads from real environment variables via os.Getenv, not the context-based env layer. Use t.Setenv instead so the SDK can see the values. Co-authored-by: Isaac

shreyas-goenka

Note: This review was posted by Claude (AI assistant). Shreyas will do a separate, more thorough review pass.

Priority: HIGH — Config resolution diverges from real CLI auth path

MAJOR: `resolveConfig` diverges from real CLI auth

The resolveConfig function in databricks doctor constructs its own config resolution path instead of going through the standard SDK/CLI authentication flow. This means the doctor command could report "config is fine" while the real CLI fails (or vice versa). If the goal is to diagnose auth issues, it should use the same code path the CLI uses.

MEDIUM: Network check bypasses SDK HTTP client

The connectivity check uses http.DefaultClient directly instead of going through the SDK's configured HTTP client. In enterprise environments with proxies or custom TLS, this will give misleading results — the check might fail even though the SDK would succeed (or vice versa).

Other Observations

Good idea for a diagnostic command overall
The step-by-step output format is user-friendly
Missing test coverage for the core diagnostic logic

When the workspace client is unavailable but config is resolved, the network check was falling back to http.DefaultClient. This ignores proxy and custom TLS settings from the SDK config, giving misleading results in enterprise environments. Use configuredNetworkHTTPClient(cfg) instead, which respects HTTPTransport and InsecureSkipVerify from the config.

…allback, skip status - Detect account-level configs (AccountID + account host) and use NewAccountClient instead of always using NewWorkspaceClient - Add 15s per-check deadline for auth and identity checks to prevent hangs on unresponsive IdP - Network check now tries even when config resolution fails, as long as a host URL is available from partial config resolution - Identity marked as 'skip' (not 'fail') when auth failed or when using account-level profile, avoiding double failures from one root cause - Add skip status rendering in text output

simonfaltum temporarily deployed to test-trigger-is March 12, 2026 23:40 — with GitHub Actions Inactive

Fix review findings: config resolution, timeouts, output routing, tests

1f6e987

Co-authored-by: Isaac

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 03:00 — with GitHub Actions Inactive

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 05:50 — with GitHub Actions Inactive

Fix config resolution, error handling, and test isolation

64638e8

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 06:27 — with GitHub Actions Inactive

Fix errcheck lint and add doctor to help golden file

10d3ba9

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 06:59 — with GitHub Actions Inactive

simonfaltum marked this pull request as ready for review March 13, 2026 11:34

simonfaltum requested review from andrewnester, anton-107, denik, pietern and shreyas-goenka as code owners March 13, 2026 11:34

shreyas-goenka reviewed Mar 13, 2026

View reviewed changes

Use SDK HTTP client for network checks, return error on check failure

039245a

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 12:39 — with GitHub Actions Inactive

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 14:47 — with GitHub Actions Inactive

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 15:07 — with GitHub Actions Inactive

Fix lint: use errors.New per perfsprint linter rule

f406908

simonfaltum temporarily deployed to test-trigger-is March 13, 2026 16:11 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add databricks doctor command#4730

Add databricks doctor command#4730
simonfaltum wants to merge 9 commits intomainfrom
simonfaltum/doctor-command

simonfaltum commented Mar 12, 2026 •

edited

Loading

Uh oh!

eng-dev-ecosystem-bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

shreyas-goenka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simonfaltum commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Changes

Open item

Test plan

Uh oh!

eng-dev-ecosystem-bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shreyas-goenka left a comment

Choose a reason for hiding this comment

Priority: HIGH — Config resolution diverges from real CLI auth path

MAJOR: resolveConfig diverges from real CLI auth

MEDIUM: Network check bypasses SDK HTTP client

Other Observations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonfaltum commented Mar 12, 2026 •

edited

Loading

eng-dev-ecosystem-bot commented Mar 12, 2026 •

edited

Loading

MAJOR: `resolveConfig` diverges from real CLI auth