Skip to content

feat(BA-5570): Add configure_only() mode to pyinfra deploy scripts#10750

Open
Yaminyam wants to merge 17 commits intomainfrom
feat/BA-5570-configure-only
Open

feat(BA-5570): Add configure_only() mode to pyinfra deploy scripts#10750
Yaminyam wants to merge 17 commits intomainfrom
feat/BA-5570-configure-only

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented Apr 2, 2026

Summary

Add CONFIGURE_ONLY deploy mode that generates config files only, skipping package installation, venv creation, and service management. This is the foundation for dev mode to reuse pyinfra deploy scripts directly.

Changes

runner.py

  • Add DeployMode.CONFIGURE_ONLY enum value
  • Add BaseDeploy.configure_only() default implementation (logs warning if not overridden)
  • run() dispatcher handles configure_only mode

Deploy scripts with configure_only():

  • appproxy coordinator — coordinator.toml, alembic.ini, run script
  • appproxy worker_base (shared by interactive/tcp/inference) — worker.toml, run script
  • appproxy traefik — traefik config YAML, TLS config, run scripts per worker type
  • manager — manager.toml, alembic.ini, run script, DB fixtures
  • agent — agent.toml, docker container opts, run script
  • storage_proxy — storage-proxy.toml, run script
  • webserver — webserver.conf, run script

Refactored:

  • Worker deploy main() functions now use .run(deploy_mode) instead of manual if/elif

What's NOT included (planned for Phase 2-3):

  • DevContext subprocess runner to invoke pyinfra from TUI
  • sudo handling for @Local connector (fact gathering triggers sudo prompt)

Test plan

  • pants lint --changed-since=origin/main passes
  • pants check --changed-since=origin/main passes (mypy)
  • All existing tests pass

🤖 Generated with Claude Code

Yaminyam and others added 15 commits April 2, 2026 14:17
- Add TRAEFIK to FrontendMode enum
- Configure coordinator with enable_traefik and traefik.etcd settings
- Configure worker with traefik section (api_port, etcd, port_proxy)
  when frontend_mode is traefik

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate group_data/all.py from backend.ai-installer with enterprise-only
items (graylog, zabbix, license hwinfo) removed. Provides host.data
defaults for SSH, OS, Docker, Python, and container registry config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p 2)

- Migrate InventoryBuilder from backend.ai-installer (1848 → 1524 lines)
- Update imports: ai.backend.pyinfra → ai.backend.install.pyinfra
- Remove enterprise-only logic (license, control_panel, fasttrack, harbor, rtun)
- Keep enterprise config stubs with enabled=False in return dicts
- Fix type inconsistency in group_data: ssh_port, bai_user_id, bai_user_group_id
  now use int() wrapper for os.getenv() calls
- Add missing type annotations for lint compliance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Step 3)

Create DevInventoryBuilder that targets @Local with Docker Compose
halfstack ports matching DevContext.hydrate_install_info(). Provides
all host.data attributes and services dict that deploy scripts expect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add inventory_local.py as pyinfra CLI entry point for local dev
- Fix bai_version default to use actual version instead of "dev"
- Verified: deploy scripts successfully read all host.data attributes
  from DevInventoryBuilder (fails only on macOS subprocess, not inventory)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shared_defaults.py with centralized port/version/credential constants
- Refactor DevInventoryBuilder to use shared_defaults instead of
  hardcoded values
- Constants are shared between DevContext (TUI) and DevInventoryBuilder (pyinfra)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 18 unit tests for DevInventoryBuilder and shared_defaults
- Remove follow_imports="skip" from mypy overrides (pyinfra is now a dependency)
- Remove import-not-found from disabled error codes
- Fix stale import path in package_manager.py
- Refactor APPPROXY_PORTS to typed constants for mypy compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EtcdConfig: hostname/port → advertised_client_ip/advertised_client_port
- ManagerConfig, WebserverConfig, HiveGatewayConfig: remove non-existent hostname field
- StorageProxyConfig: hostname → removed, client_port → port
- Update test to use correct field (advertised_client_port)
- Add changelog for PR #10738

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n macOS

Workaround for gevent/gevent#2169: monkey.patch_all() sets
subprocess._fork_exec to None on macOS + Python 3.13. This wrapper
saves and restores _fork_exec before running pyinfra CLI.

Can be removed once gevent fixes the issue upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations

- Add pyinfra~=3.7 and passlib~=1.7.4 to requirements.txt (was missing from commit)
- Regenerate python.lock with new dependencies
- Fix dict → dict[str, object] type annotations in tests for mypy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create config_gen/appproxy.py with CoordinatorParams, WorkerParams,
  build_coordinator_config(), build_worker_config()
- Refactor context.py:configure_appproxy() to use shared module
  (~150 lines of tomlkit manipulation → ~30 lines of param construction)
- Frontend mode logic (port/wildcard/traefik) now lives in one place
- Add 16 unit tests covering all frontend modes and edge cases
- Import FrontendMode from types.py (single definition)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change from build_*_config() (returns new dict) to apply_*_config()
(modifies existing tomlkit document in-place). This preserves comments
and structure from sample.toml files, which are auto-generated by
`backend.ai mgr config generate-sample`.

- apply_coordinator_config(doc, params) modifies coordinator toml doc
- apply_worker_config(doc, params) modifies worker toml doc
- context.py loads sample, applies params, writes back
- Tests use sample TOML strings to verify comment preservation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove type: ignore comments from config_gen/appproxy.py (handled by mypy overrides)
- Add index/operator to mypy disable_error_code for ai.backend.install.*
- Add mypy override for tests.unit.install.* (tomlkit dict-like access)
- Fix dict[str, object] → dict[str, Any] in test_dev_inventory.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove all unused type: ignore comments (now handled by mypy overrides)
- Fix str | None → str arg-type for appproxy secrets (add `or ""` fallback)
- Fix dict → dict[str, object] type param in config_gen/agent.py
- Fix ConsoleRenderable → str in context.py:462
- Auto-format with ruff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CONFIGURE_ONLY to DeployMode enum and implement configure_only()
in all core deploy scripts. This mode generates config files only,
skipping package installation, venv creation, and service management.

Modified deploy scripts:
- runner.py: DeployMode.CONFIGURE_ONLY + base configure_only()
- appproxy coordinator, worker_base (shared by interactive/tcp/inference)
- appproxy traefik
- manager, agent, storage_proxy, webserver

All worker deploy main() refactored to use .run(deploy_mode) pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 2, 2026 09:28
@github-actions github-actions bot added the size:XL 500~ LoC label Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Backend.AI installer’s pyinfra tooling with a new deploy mode (configure_only) intended to generate configuration artifacts without performing package installs, venv creation, or service management, and introduces dev-focused inventory/config-generation helpers with accompanying unit tests.

Changes:

  • Add DeployMode.CONFIGURE_ONLY to the pyinfra runner and wire deploy scripts to dispatch via .run(deploy_mode).
  • Introduce dev inventory building + shared default constants for local @local pyinfra runs (plus inventory/group_data helpers).
  • Add shared config_gen modules (AppProxy + Agent) using tomlkit, with new unit tests and Pants BUILD targets.

Reviewed changes

Copilot reviewed 30 out of 34 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/unit/install/pyinfra/inventory/test_dev_inventory.py Adds unit tests for DevInventoryBuilder and shared defaults.
tests/unit/install/pyinfra/inventory/BUILD Adds Pants test target for pyinfra inventory tests.
tests/unit/install/config_gen/test_appproxy.py Adds unit tests for shared AppProxy TOML generation via tomlkit.
tests/unit/install/config_gen/BUILD Adds Pants test target for config_gen tests.
src/ai/backend/install/types.py Extends installer FrontendMode enum with TRAEFIK.
src/ai/backend/install/pyinfra/runner.py Adds CONFIGURE_ONLY mode dispatch and default configure_only() implementation.
src/ai/backend/install/pyinfra/os_packages/package_manager.py Fixes import path to install.pyinfra.platform_utils.
src/ai/backend/install/pyinfra/inventory/shared_defaults.py Introduces centralized constants (ports, versions, dev defaults).
src/ai/backend/install/pyinfra/inventory/run_local.py Adds local pyinfra wrapper with gevent monkey-patch workaround.
src/ai/backend/install/pyinfra/inventory/inventory_local.py Adds local inventory entrypoint using DevInventoryBuilder.
src/ai/backend/install/pyinfra/inventory/group_data/all.py Adds global pyinfra host.data defaults driven by env vars.
src/ai/backend/install/pyinfra/inventory/dev_inventory.py Implements DevInventoryBuilder for local @local deployments.
src/ai/backend/install/pyinfra/inventory/builder.py Adds large unified inventory builder (single + HA modes).
src/ai/backend/install/pyinfra/deploy/cores/webserver/deploy.py Adds configure_only() implementation.
src/ai/backend/install/pyinfra/deploy/cores/storage_proxy/deploy.py Adds configure_only() implementation.
src/ai/backend/install/pyinfra/deploy/cores/manager/deploy.py Adds configure_only() implementation.
src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_tcp/deploy.py Refactors dispatcher to .run(deploy_mode).
src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_interactive/deploy.py Refactors dispatcher to .run(deploy_mode).
src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_inference/deploy.py Refactors dispatcher to .run(deploy_mode).
src/ai/backend/install/pyinfra/deploy/cores/appproxy/worker_base.py Adds configure_only() for shared worker deployment base.
src/ai/backend/install/pyinfra/deploy/cores/appproxy/traefik/deploy.py Adds config-only Traefik generation and refactors dispatcher.
src/ai/backend/install/pyinfra/deploy/cores/appproxy/coordinator/deploy.py Adds configure_only() for coordinator artifacts.
src/ai/backend/install/pyinfra/deploy/cores/agent/deploy.py Adds configure_only() and refactors dispatcher.
src/ai/backend/install/context.py Refactors Agent/AppProxy config writes to use new config_gen modules.
src/ai/backend/install/config_gen/appproxy.py Adds shared TOML mutation helpers for AppProxy coordinator/worker.
src/ai/backend/install/config_gen/agent.py Adds shared TOML mutation helper for Agent config.
requirements.txt Adds dependencies (pyinfra, passlib).
python.lock Updates lockfile for new dependencies.
pyproject.toml Adjusts mypy overrides for installer/pyinfra/tests.
changes/10738.feature.md Adds Towncrier fragment for the inventory migration/dev support.
app-proxy-worker.toml.bak Adds a backup worker TOML file (contains populated secrets).
app-proxy-coordinator.toml.bak Adds a backup coordinator TOML file (contains populated secrets).
Comments suppressed due to low confidence (2)

src/ai/backend/install/pyinfra/inventory/group_data/all.py:99

  • This file includes hard-coded credential-like values (registry_password default). Even if intended as a placeholder, committing a non-empty password risks accidental reuse and can trigger secret scanners. Prefer an empty default and require the value to be provided via environment variables or per-inventory host data.
# -- Container registry configuration
registry_type = os.getenv("PYINFRA_CONTAINER_REGISTRY_TYPE", "harbor2")
registry_scheme = os.getenv("PYINFRA_CONTAINER_REGISTRY_SCHEME", "http")
registry_name = os.getenv("PYINFRA_CONTAINER_REGISTRY_NAME", "bai-repo")
registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080")
registry_username = os.getenv("PYINFRA_CONTAINER_REGISTRY_USERNAME", "bai")
registry_projects = os.getenv("PYINFRA_CONTAINER_REGISTRY_PROJECTS", "bai,bai-user")
registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "lY0B=op3")

changes/10738.feature.md:2

  • Towncrier news fragments are expected to be a single-line sentence. This fragment has a trailing blank line (and no terminating period), which can cause style/lint checks to fail. Please keep it to exactly one non-empty line (typically ending with a period).
Migrate pyinfra inventory system from backend.ai-installer with dev inventory support


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +237 to +260
case FrontendMode.TRAEFIK:
# Remove port_proxy section
if "port_proxy" in doc["proxy_worker"]:
del doc["proxy_worker"]["port_proxy"]

# Add traefik section
traefik_table = tomlkit.table()
traefik_table["api_port"] = params.traefik_api_port
traefik_table["last_used_time_marker_directory"] = params.traefik_last_used_dir
traefik_etcd_table = tomlkit.table()
traefik_etcd_table["namespace"] = params.traefik_etcd_namespace
traefik_etcd_table["addr"] = _make_inline_table({
"host": params.traefik_etcd_host,
"port": params.traefik_etcd_port,
})
traefik_table["etcd"] = traefik_etcd_table
port_proxy_table = tomlkit.table()
port_proxy_table["advertised_host"] = params.port_proxy_advertised_host
port_proxy_table["bind_port_range"] = [
params.port_proxy_range_start,
params.port_proxy_range_end,
]
traefik_table["port_proxy"] = port_proxy_table
doc["proxy_worker"]["traefik"] = traefik_table
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TRAEFIK mode, the generated [proxy_worker.traefik] structure does not match the appproxy worker config schema: it adds an etcd subsection (which the worker TraefikConfig doesn’t define) and writes bind_port_range under traefik.port_proxy, while the schema expects port_range (tuple) there. As-is, appproxy worker config validation will fail when frontend_mode = "traefik". Please align the generated keys/shape to ai.backend.appproxy.worker.config.TraefikConfig (e.g., set traefik.frontend_mode and use traefik.port_proxy.port_range, and drop the unsupported etcd block).

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +231
def test_traefik_mode(self) -> None:
params = WorkerParams(
api_secret="s",
jwt_secret="j",
permit_hash_secret="p",
frontend_mode=FrontendMode.TRAEFIK,
traefik_etcd_host="10.0.0.1",
traefik_etcd_port=2379,
port_proxy_advertised_host="proxy.example.com",
)
doc = _load_worker_doc()
apply_worker_config(doc, params)
assert doc["proxy_worker"]["frontend_mode"] == "traefik"
assert "port_proxy" not in doc["proxy_worker"]
assert doc["proxy_worker"]["traefik"]["api_port"] == 18080
assert doc["proxy_worker"]["traefik"]["etcd"]["addr"]["host"] == "10.0.0.1"
assert (
doc["proxy_worker"]["traefik"]["port_proxy"]["advertised_host"] == "proxy.example.com"
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TRAEFIK-mode unit test is asserting fields (traefik.etcd.*, traefik.port_proxy.bind_port_range) that don’t exist in the appproxy worker’s TraefikConfig schema. This test currently locks in an invalid config shape and would not catch the runtime validation failure. Please update the expectations to the real schema (e.g., traefik.frontend_mode and traefik.port_proxy.port_range) and consider adding an assertion that the generated TOML can be parsed/validated by the worker config model.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +26
_original_fork_exec = subprocess._fork_exec

from gevent import monkey # noqa: E402

monkey.patch_all()

# Restore _fork_exec after monkey patching
subprocess._fork_exec = _original_fork_exec
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script directly reads/writes the private attribute subprocess._fork_exec. On platforms/builds where _fork_exec is missing (or behaves differently), importing this module will raise AttributeError. Please guard this with getattr/hasattr and only restore when the attribute exists, so the wrapper fails gracefully outside the affected macOS+CPython scenario.

Suggested change
_original_fork_exec = subprocess._fork_exec
from gevent import monkey # noqa: E402
monkey.patch_all()
# Restore _fork_exec after monkey patching
subprocess._fork_exec = _original_fork_exec
_original_fork_exec = getattr(subprocess, "_fork_exec", None)
from gevent import monkey # noqa: E402
monkey.patch_all()
# Restore _fork_exec after monkey patching (if it existed originally)
if _original_fork_exec is not None:
try:
subprocess._fork_exec = _original_fork_exec
except AttributeError:
# On some platforms/implementations _fork_exec may not be available or settable.
pass

Copilot uses AI. Check for mistakes.
Comment on lines 354 to +368
[[tool.mypy.overrides]]
module = [
"ai.backend.install.*",
]
disable_error_code = ["union-attr", "misc"]
disable_error_code = ["union-attr", "misc", "index", "operator"]

# PyInfra is an optional dependency for production deployment
# Skip import checking when pyinfra is not installed
# PyInfra deploy scripts use dynamic host.data access patterns
[[tool.mypy.overrides]]
module = "ai.backend.install.pyinfra.*"
disable_error_code = ["import-not-found", "no-any-return", "attr-defined", "arg-type", "return-value", "type-arg", "func-returns-value"]
follow_imports = "skip"
disable_error_code = ["no-any-return", "attr-defined", "arg-type", "call-arg", "return-value", "type-arg", "func-returns-value"]

# Tests for installer config_gen use tomlkit dict-like access patterns
[[tool.mypy.overrides]]
module = "tests.unit.install.*"
disable_error_code = ["index", "operator"]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mypy override for ai.backend.install.* now disables index and operator for the entire installer package. This is a very broad suppression and can hide real regressions outside the TOML/dict-like areas that motivated it. Consider scoping index/operator disables to the specific modules that need it (e.g., ai.backend.install.config_gen.* / tomlkit-heavy paths) instead of the whole package.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +30
# -- SSH configuration
ssh_user = os.getenv("PYINFRA_SSH_USER", "bai")
ssh_key = os.getenv("PYINFRA_SSH_KEY", "~/.ssh/id_rsa")
ssh_pubkey = os.getenv("PYINFRA_SSH_PUBKEY", "~/.ssh/id_rsa.pub")
ssh_port = int(os.getenv("PYINFRA_SSH_PORT", "22"))
ssh_password = os.getenv("PYINFRA_SSH_PASSWORD", "")
ssh_strict_host_key_checking = "no"

if not os.getenv("PYINFRA_SUDO_PASSWORD"):
logger.warning("Set PYINFRA_SUDO_PASSWORD to run sudo without password input")

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssh_strict_host_key_checking defaults to "no". For non-local deployments this weakens SSH security (MITM risk) unless the user notices and overrides it via env vars. Consider defaulting to strict checking ("yes"), and only relaxing it when explicitly requested (e.g., via PYINFRA_SSH_STRICT_HOST_KEY_CHECKING).

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +29
api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM"
jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik"

[permit_hash]
secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .bak file appears to contain a full app-proxy worker config with populated secrets (api_secret, jwt_secret, permit_hash.secret). Committing this risks leaking credentials and can trip secret-scanning. Please remove it from the repository (or replace secrets with obvious placeholders and move it under a dedicated test fixture path if it is needed).

Suggested change
api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM"
jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik"
[permit_hash]
secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk"
api_secret = "CHANGE_ME_API_SECRET"
jwt_secret = "CHANGE_ME_JWT_SECRET"
[permit_hash]
secret = "CHANGE_ME_PERMIT_HASH_SECRET"

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +34
api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM"
jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik"

[permit_hash]
secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .bak file appears to contain a full app-proxy coordinator config with populated secrets (api_secret, jwt_secret, permit_hash.secret). Committing this risks leaking credentials and can trip secret-scanning. Please remove it from the repository (or replace secrets with obvious placeholders and move it under a dedicated test fixture path if it is needed).

Suggested change
api_secret = "WPoPk3_Z11yqeQ673w3KuJzXb1fbSNSmAXTaEgJ7_kM"
jwt_secret = "ADadDTqhua1hIzIj7WnFWnSe-3mWclNv9brIrH2M-Ik"
[permit_hash]
secret = "_ys4SubzC_3FungNVEOPieA_LFXRuyOUgzvJ2eiQFSk"
api_secret = "CHANGE_ME_API_SECRET"
jwt_secret = "CHANGE_ME_JWT_SECRET"
[permit_hash]
secret = "CHANGE_ME_PERMIT_HASH_SECRET"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants