feat(BA-5570): Add configure_only() mode to pyinfra deploy scripts#10750
feat(BA-5570): Add configure_only() mode to pyinfra deploy scripts#10750
Conversation
- Add TRAEFIK to FrontendMode enum - Configure coordinator with enable_traefik and traefik.etcd settings - Configure worker with traefik section (api_port, etcd, port_proxy) when frontend_mode is traefik Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate group_data/all.py from backend.ai-installer with enterprise-only items (graylog, zabbix, license hwinfo) removed. Provides host.data defaults for SSH, OS, Docker, Python, and container registry config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p 2) - Migrate InventoryBuilder from backend.ai-installer (1848 → 1524 lines) - Update imports: ai.backend.pyinfra → ai.backend.install.pyinfra - Remove enterprise-only logic (license, control_panel, fasttrack, harbor, rtun) - Keep enterprise config stubs with enabled=False in return dicts - Fix type inconsistency in group_data: ssh_port, bai_user_id, bai_user_group_id now use int() wrapper for os.getenv() calls - Add missing type annotations for lint compliance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Step 3) Create DevInventoryBuilder that targets @Local with Docker Compose halfstack ports matching DevContext.hydrate_install_info(). Provides all host.data attributes and services dict that deploy scripts expect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add inventory_local.py as pyinfra CLI entry point for local dev - Fix bai_version default to use actual version instead of "dev" - Verified: deploy scripts successfully read all host.data attributes from DevInventoryBuilder (fails only on macOS subprocess, not inventory) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shared_defaults.py with centralized port/version/credential constants - Refactor DevInventoryBuilder to use shared_defaults instead of hardcoded values - Constants are shared between DevContext (TUI) and DevInventoryBuilder (pyinfra) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 18 unit tests for DevInventoryBuilder and shared_defaults - Remove follow_imports="skip" from mypy overrides (pyinfra is now a dependency) - Remove import-not-found from disabled error codes - Fix stale import path in package_manager.py - Refactor APPPROXY_PORTS to typed constants for mypy compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EtcdConfig: hostname/port → advertised_client_ip/advertised_client_port - ManagerConfig, WebserverConfig, HiveGatewayConfig: remove non-existent hostname field - StorageProxyConfig: hostname → removed, client_port → port - Update test to use correct field (advertised_client_port) - Add changelog for PR #10738 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n macOS Workaround for gevent/gevent#2169: monkey.patch_all() sets subprocess._fork_exec to None on macOS + Python 3.13. This wrapper saves and restores _fork_exec before running pyinfra CLI. Can be removed once gevent fixes the issue upstream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations - Add pyinfra~=3.7 and passlib~=1.7.4 to requirements.txt (was missing from commit) - Regenerate python.lock with new dependencies - Fix dict → dict[str, object] type annotations in tests for mypy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create config_gen/appproxy.py with CoordinatorParams, WorkerParams, build_coordinator_config(), build_worker_config() - Refactor context.py:configure_appproxy() to use shared module (~150 lines of tomlkit manipulation → ~30 lines of param construction) - Frontend mode logic (port/wildcard/traefik) now lives in one place - Add 16 unit tests covering all frontend modes and edge cases - Import FrontendMode from types.py (single definition) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change from build_*_config() (returns new dict) to apply_*_config() (modifies existing tomlkit document in-place). This preserves comments and structure from sample.toml files, which are auto-generated by `backend.ai mgr config generate-sample`. - apply_coordinator_config(doc, params) modifies coordinator toml doc - apply_worker_config(doc, params) modifies worker toml doc - context.py loads sample, applies params, writes back - Tests use sample TOML strings to verify comment preservation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove type: ignore comments from config_gen/appproxy.py (handled by mypy overrides) - Add index/operator to mypy disable_error_code for ai.backend.install.* - Add mypy override for tests.unit.install.* (tomlkit dict-like access) - Fix dict[str, object] → dict[str, Any] in test_dev_inventory.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove all unused type: ignore comments (now handled by mypy overrides) - Fix str | None → str arg-type for appproxy secrets (add `or ""` fallback) - Fix dict → dict[str, object] type param in config_gen/agent.py - Fix ConsoleRenderable → str in context.py:462 - Auto-format with ruff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CONFIGURE_ONLY to DeployMode enum and implement configure_only() in all core deploy scripts. This mode generates config files only, skipping package installation, venv creation, and service management. Modified deploy scripts: - runner.py: DeployMode.CONFIGURE_ONLY + base configure_only() - appproxy coordinator, worker_base (shared by interactive/tcp/inference) - appproxy traefik - manager, agent, storage_proxy, webserver All worker deploy main() refactored to use .run(deploy_mode) pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the Backend.AI installer’s pyinfra tooling with a new deploy mode (configure_only) intended to generate configuration artifacts without performing package installs, venv creation, or service management, and introduces dev-focused inventory/config-generation helpers with accompanying unit tests.
Changes:
- Add
DeployMode.CONFIGURE_ONLYto the pyinfra runner and wire deploy scripts to dispatch via.run(deploy_mode). - Introduce dev inventory building + shared default constants for local
@localpyinfra runs (plus inventory/group_data helpers). - Add shared
config_genmodules (AppProxy + Agent) usingtomlkit, with new unit tests and Pants BUILD targets.
Reviewed changes
Copilot reviewed 30 out of 34 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/install/pyinfra/inventory/test_dev_inventory.py | Adds unit tests for DevInventoryBuilder and shared defaults. |
| tests/unit/install/pyinfra/inventory/BUILD | Adds Pants test target for pyinfra inventory tests. |
| tests/unit/install/config_gen/test_appproxy.py | Adds unit tests for shared AppProxy TOML generation via tomlkit. |
| tests/unit/install/config_gen/BUILD | Adds Pants test target for config_gen tests. |
| src/ai/backend/install/types.py | Extends installer FrontendMode enum with TRAEFIK. |
| src/ai/backend/install/pyinfra/runner.py | Adds CONFIGURE_ONLY mode dispatch and default configure_only() implementation. |
| src/ai/backend/install/pyinfra/os_packages/package_manager.py | Fixes import path to install.pyinfra.platform_utils. |
| src/ai/backend/install/pyinfra/inventory/shared_defaults.py | Introduces centralized constants (ports, versions, dev defaults). |
| src/ai/backend/install/pyinfra/inventory/run_local.py | Adds local pyinfra wrapper with gevent monkey-patch workaround. |
| src/ai/backend/install/pyinfra/inventory/inventory_local.py | Adds local inventory entrypoint using DevInventoryBuilder. |
| src/ai/backend/install/pyinfra/inventory/group_data/all.py | Adds global pyinfra host.data defaults driven by env vars. |
| src/ai/backend/install/pyinfra/inventory/dev_inventory.py | Implements DevInventoryBuilder for local @local deployments. |
| src/ai/backend/install/pyinfra/inventory/builder.py | Adds large unified inventory builder (single + HA modes). |
| src/ai/backend/install/pyinfra/deploy/cores/webserver/deploy.py | Adds configure_only() implementation. |
| src/ai/backend/install/pyinfra/deploy/cores/storage_proxy/deploy.py | Adds configure_only() implementation. |
| src/ai/backend/install/pyinfra/deploy/cores/manager/deploy.py | Adds configure_only() implementation. |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_tcp/deploy.py | Refactors dispatcher to .run(deploy_mode). |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_interactive/deploy.py | Refactors dispatcher to .run(deploy_mode). |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_inference/deploy.py | Refactors dispatcher to .run(deploy_mode). |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_base.py | Adds configure_only() for shared worker deployment base. |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/traefik/deploy.py | Adds config-only Traefik generation and refactors dispatcher. |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/coordinator/deploy.py | Adds configure_only() for coordinator artifacts. |
| src/ai/backend/install/pyinfra/deploy/cores/agent/deploy.py | Adds configure_only() and refactors dispatcher. |
| src/ai/backend/install/context.py | Refactors Agent/AppProxy config writes to use new config_gen modules. |
| src/ai/backend/install/config_gen/appproxy.py | Adds shared TOML mutation helpers for AppProxy coordinator/worker. |
| src/ai/backend/install/config_gen/agent.py | Adds shared TOML mutation helper for Agent config. |
| requirements.txt | Adds dependencies (pyinfra, passlib). |
| python.lock | Updates lockfile for new dependencies. |
| pyproject.toml | Adjusts mypy overrides for installer/pyinfra/tests. |
| changes/10738.feature.md | Adds Towncrier fragment for the inventory migration/dev support. |
| app-proxy-worker.toml.bak | Adds a backup worker TOML file (contains populated secrets). |
| app-proxy-coordinator.toml.bak | Adds a backup coordinator TOML file (contains populated secrets). |
Comments suppressed due to low confidence (2)
src/ai/backend/install/pyinfra/inventory/group_data/all.py:99
- This file includes hard-coded credential-like values (
registry_passworddefault). Even if intended as a placeholder, committing a non-empty password risks accidental reuse and can trigger secret scanners. Prefer an empty default and require the value to be provided via environment variables or per-inventory host data.
# -- Container registry configuration
registry_type = os.getenv("PYINFRA_CONTAINER_REGISTRY_TYPE", "harbor2")
registry_scheme = os.getenv("PYINFRA_CONTAINER_REGISTRY_SCHEME", "http")
registry_name = os.getenv("PYINFRA_CONTAINER_REGISTRY_NAME", "bai-repo")
registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080")
registry_username = os.getenv("PYINFRA_CONTAINER_REGISTRY_USERNAME", "bai")
registry_projects = os.getenv("PYINFRA_CONTAINER_REGISTRY_PROJECTS", "bai,bai-user")
registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "lY0B=op3")
changes/10738.feature.md:2
- Towncrier news fragments are expected to be a single-line sentence. This fragment has a trailing blank line (and no terminating period), which can cause style/lint checks to fail. Please keep it to exactly one non-empty line (typically ending with a period).
Migrate pyinfra inventory system from backend.ai-installer with dev inventory support
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case FrontendMode.TRAEFIK: | ||
| # Remove port_proxy section | ||
| if "port_proxy" in doc["proxy_worker"]: | ||
| del doc["proxy_worker"]["port_proxy"] | ||
|
|
||
| # Add traefik section | ||
| traefik_table = tomlkit.table() | ||
| traefik_table["api_port"] = params.traefik_api_port | ||
| traefik_table["last_used_time_marker_directory"] = params.traefik_last_used_dir | ||
| traefik_etcd_table = tomlkit.table() | ||
| traefik_etcd_table["namespace"] = params.traefik_etcd_namespace | ||
| traefik_etcd_table["addr"] = _make_inline_table({ | ||
| "host": params.traefik_etcd_host, | ||
| "port": params.traefik_etcd_port, | ||
| }) | ||
| traefik_table["etcd"] = traefik_etcd_table | ||
| port_proxy_table = tomlkit.table() | ||
| port_proxy_table["advertised_host"] = params.port_proxy_advertised_host | ||
| port_proxy_table["bind_port_range"] = [ | ||
| params.port_proxy_range_start, | ||
| params.port_proxy_range_end, | ||
| ] | ||
| traefik_table["port_proxy"] = port_proxy_table | ||
| doc["proxy_worker"]["traefik"] = traefik_table |
There was a problem hiding this comment.
In TRAEFIK mode, the generated [proxy_worker.traefik] structure does not match the appproxy worker config schema: it adds an etcd subsection (which the worker TraefikConfig doesn’t define) and writes bind_port_range under traefik.port_proxy, while the schema expects port_range (tuple) there. As-is, appproxy worker config validation will fail when frontend_mode = "traefik". Please align the generated keys/shape to ai.backend.appproxy.worker.config.TraefikConfig (e.g., set traefik.frontend_mode and use traefik.port_proxy.port_range, and drop the unsupported etcd block).
| def test_traefik_mode(self) -> None: | ||
| params = WorkerParams( | ||
| api_secret="s", | ||
| jwt_secret="j", | ||
| permit_hash_secret="p", | ||
| frontend_mode=FrontendMode.TRAEFIK, | ||
| traefik_etcd_host="10.0.0.1", | ||
| traefik_etcd_port=2379, | ||
| port_proxy_advertised_host="proxy.example.com", | ||
| ) | ||
| doc = _load_worker_doc() | ||
| apply_worker_config(doc, params) | ||
| assert doc["proxy_worker"]["frontend_mode"] == "traefik" | ||
| assert "port_proxy" not in doc["proxy_worker"] | ||
| assert doc["proxy_worker"]["traefik"]["api_port"] == 18080 | ||
| assert doc["proxy_worker"]["traefik"]["etcd"]["addr"]["host"] == "10.0.0.1" | ||
| assert ( | ||
| doc["proxy_worker"]["traefik"]["port_proxy"]["advertised_host"] == "proxy.example.com" | ||
| ) |
There was a problem hiding this comment.
The TRAEFIK-mode unit test is asserting fields (traefik.etcd.*, traefik.port_proxy.bind_port_range) that don’t exist in the appproxy worker’s TraefikConfig schema. This test currently locks in an invalid config shape and would not catch the runtime validation failure. Please update the expectations to the real schema (e.g., traefik.frontend_mode and traefik.port_proxy.port_range) and consider adding an assertion that the generated TOML can be parsed/validated by the worker config model.
| _original_fork_exec = subprocess._fork_exec | ||
|
|
||
| from gevent import monkey # noqa: E402 | ||
|
|
||
| monkey.patch_all() | ||
|
|
||
| # Restore _fork_exec after monkey patching | ||
| subprocess._fork_exec = _original_fork_exec |
There was a problem hiding this comment.
This script directly reads/writes the private attribute subprocess._fork_exec. On platforms/builds where _fork_exec is missing (or behaves differently), importing this module will raise AttributeError. Please guard this with getattr/hasattr and only restore when the attribute exists, so the wrapper fails gracefully outside the affected macOS+CPython scenario.
| _original_fork_exec = subprocess._fork_exec | |
| from gevent import monkey # noqa: E402 | |
| monkey.patch_all() | |
| # Restore _fork_exec after monkey patching | |
| subprocess._fork_exec = _original_fork_exec | |
| _original_fork_exec = getattr(subprocess, "_fork_exec", None) | |
| from gevent import monkey # noqa: E402 | |
| monkey.patch_all() | |
| # Restore _fork_exec after monkey patching (if it existed originally) | |
| if _original_fork_exec is not None: | |
| try: | |
| subprocess._fork_exec = _original_fork_exec | |
| except AttributeError: | |
| # On some platforms/implementations _fork_exec may not be available or settable. | |
| pass |
| [[tool.mypy.overrides]] | ||
| module = [ | ||
| "ai.backend.install.*", | ||
| ] | ||
| disable_error_code = ["union-attr", "misc"] | ||
| disable_error_code = ["union-attr", "misc", "index", "operator"] | ||
|
|
||
| # PyInfra is an optional dependency for production deployment | ||
| # Skip import checking when pyinfra is not installed | ||
| # PyInfra deploy scripts use dynamic host.data access patterns | ||
| [[tool.mypy.overrides]] | ||
| module = "ai.backend.install.pyinfra.*" | ||
| disable_error_code = ["import-not-found", "no-any-return", "attr-defined", "arg-type", "return-value", "type-arg", "func-returns-value"] | ||
| follow_imports = "skip" | ||
| disable_error_code = ["no-any-return", "attr-defined", "arg-type", "call-arg", "return-value", "type-arg", "func-returns-value"] | ||
|
|
||
| # Tests for installer config_gen use tomlkit dict-like access patterns | ||
| [[tool.mypy.overrides]] | ||
| module = "tests.unit.install.*" | ||
| disable_error_code = ["index", "operator"] |
There was a problem hiding this comment.
The mypy override for ai.backend.install.* now disables index and operator for the entire installer package. This is a very broad suppression and can hide real regressions outside the TOML/dict-like areas that motivated it. Consider scoping index/operator disables to the specific modules that need it (e.g., ai.backend.install.config_gen.* / tomlkit-heavy paths) instead of the whole package.
| # -- SSH configuration | ||
| ssh_user = os.getenv("PYINFRA_SSH_USER", "bai") | ||
| ssh_key = os.getenv("PYINFRA_SSH_KEY", "~/.ssh/id_rsa") | ||
| ssh_pubkey = os.getenv("PYINFRA_SSH_PUBKEY", "~/.ssh/id_rsa.pub") | ||
| ssh_port = int(os.getenv("PYINFRA_SSH_PORT", "22")) | ||
| ssh_password = os.getenv("PYINFRA_SSH_PASSWORD", "") | ||
| ssh_strict_host_key_checking = "no" | ||
|
|
||
| if not os.getenv("PYINFRA_SUDO_PASSWORD"): | ||
| logger.warning("Set PYINFRA_SUDO_PASSWORD to run sudo without password input") | ||
|
|
There was a problem hiding this comment.
ssh_strict_host_key_checking defaults to "no". For non-local deployments this weakens SSH security (MITM risk) unless the user notices and overrides it via env vars. Consider defaulting to strict checking ("yes"), and only relaxing it when explicitly requested (e.g., via PYINFRA_SSH_STRICT_HOST_KEY_CHECKING).
| api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM" | ||
| jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik" | ||
|
|
||
| [permit_hash] | ||
| secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk" |
There was a problem hiding this comment.
This .bak file appears to contain a full app-proxy worker config with populated secrets (api_secret, jwt_secret, permit_hash.secret). Committing this risks leaking credentials and can trip secret-scanning. Please remove it from the repository (or replace secrets with obvious placeholders and move it under a dedicated test fixture path if it is needed).
| api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM" | |
| jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik" | |
| [permit_hash] | |
| secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk" | |
| api_secret = "CHANGE_ME_API_SECRET" | |
| jwt_secret = "CHANGE_ME_JWT_SECRET" | |
| [permit_hash] | |
| secret = "CHANGE_ME_PERMIT_HASH_SECRET" |
| api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM" | ||
| jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik" | ||
|
|
||
| [permit_hash] | ||
| secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk" |
There was a problem hiding this comment.
This .bak file appears to contain a full app-proxy coordinator config with populated secrets (api_secret, jwt_secret, permit_hash.secret). Committing this risks leaking credentials and can trip secret-scanning. Please remove it from the repository (or replace secrets with obvious placeholders and move it under a dedicated test fixture path if it is needed).
| api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM" | |
| jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik" | |
| [permit_hash] | |
| secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk" | |
| api_secret = "CHANGE_ME_API_SECRET" | |
| jwt_secret = "CHANGE_ME_JWT_SECRET" | |
| [permit_hash] | |
| secret = "CHANGE_ME_PERMIT_HASH_SECRET" |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Add
CONFIGURE_ONLYdeploy mode that generates config files only, skipping package installation, venv creation, and service management. This is the foundation for dev mode to reuse pyinfra deploy scripts directly.Changes
runner.pyDeployMode.CONFIGURE_ONLYenum valueBaseDeploy.configure_only()default implementation (logs warning if not overridden)run()dispatcher handlesconfigure_onlymodeDeploy scripts with
configure_only():Refactored:
main()functions now use.run(deploy_mode)instead of manual if/elifWhat's NOT included (planned for Phase 2-3):
Test plan
pants lint --changed-since=origin/mainpassespants check --changed-since=origin/mainpasses (mypy)🤖 Generated with Claude Code