Skip to content

feat(github-workspace): add GitHub Actions workspace kind with git-backed state persistence#11457

Draft
sylvainsf wants to merge 13 commits intomainfrom
filesystem-state
Draft

feat(github-workspace): add GitHub Actions workspace kind with git-backed state persistence#11457
sylvainsf wants to merge 13 commits intomainfrom
filesystem-state

Conversation

@sylvainsf
Copy link
Copy Markdown
Contributor

@sylvainsf sylvainsf commented Mar 19, 2026

Description

Adds a new "github" workspace kind to Radius for use in GitHub Actions workflows. When this kind is active, Radius manages a k3d cluster for the duration of the workflow, persists database state between runs via a git orphan branch, and guards against concurrent or interrupted deploys with a deploy lock.

How it works

  1. rad init --kind github — creates a k3d cluster, installs Radius with PostgreSQL as the state backend (database.enabled=true), checks the state orphan branch for prior backup state and restores it if the previous run shut down cleanly.
  2. rad deploy — acquires a .deploy-lock on the radius-state orphan branch before deploying. A later retry of the same workflow run (higher GITHUB_RUN_ATTEMPT) automatically takes over a stale lock so the job can continue. A lock held by a completely different run returns an error immediately.
  3. rad shutdown [--cleanup] — backs up PostgreSQL to the state worktree (via kubectl exec pg_dump), commits and pushes to radius-state, writes .backup-ok, and optionally deletes the k3d cluster.

State isolation — git worktree

State files (SQL dumps, .lock, .backup-ok, .deploy-lock) live only in the orphan branch, checked out into an OS temp directory via git worktree add. They are never written to the application working tree and never appear in git status on main.

Semaphore / spot-instance safety

.lock .backup-ok State rad init behaviour
absent absent First run start fresh
absent present Clean shutdown restore from backup
present Interrupted (spot eviction) skip restore, log warning

Deploy lock (idempotent retry)

.deploy-lock present Same RunID, lower RunAttempt Same RunID, same RunAttempt Different RunID
No Lock acquired
Yes Take over stale lock (retry) ErrDeployLockHeld ErrDeployLockHeld

Helm PostgreSQL fixes (pre-existing gaps)

  • RP configmaps were hardcoded to provider: apiserver — now conditional on database.enabled.
  • No init-db scripts were ever run — now mounted via /docker-entrypoint-initdb.d/.
  • POSTGRES_DB secret value was the literal string "POSTGRES_DB" — fixed to "radius".

New packages

Package Purpose
pkg/cli/k3d k3d cluster lifecycle management via os/exec
pkg/cli/pgbackup PostgreSQL backup/restore via kubectl exec pg_dump/psql
pkg/cli/gitstate Orphan branch state via git worktree; semaphore + deploy lock

New commands

  • rad init --kind github
  • rad deploy (extended: acquires deploy lock for GitHub workspaces)
  • rad shutdown [--cleanup]

New workflow

  • .github/workflows/functional-test-github-workspace.yaml — end-to-end integration test exercising the full init → deploy → shutdown → restore lifecycle on a GitHub Actions runner.

Outstanding TODOs (tracked in code)

  • rad resource type sync after restore (command does not yet exist)

Type of change

  • This pull request adds or changes features of Radius and has an approved issue (issue link required).

Contributor checklist

Please verify that the PR meets the following requirements, where applicable:

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Yes
    • Not applicable
  • A design document PR is created in the design-notes repository, if new APIs are being introduced.
    • Yes
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Yes
    • Not applicable
  • A PR for the samples repository is created, if existing samples are affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for the documentation repository is created, if the changes in this PR affect the documentation or any user facing updates are made.
    • Yes
    • Not applicable
  • A PR for the recipes repository is created, if existing recipes are affected by the changes in this PR.
    • Yes
    • Not applicable

…optionally delete the k3d cluster.

- Created `gitstate` package for managing state commits to an orphan branch.
- Added `pgbackup` package for handling PostgreSQL backups and restores.
- Introduced `k3d` package for managing k3d clusters.
- Enhanced workspace connection handling to support GitHub workspaces.
- Updated validation logic to accommodate new workspace types.
- Added tests for new functionality in shutdown, gitstate, k3d, and pgbackup packages.

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
…est files

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 18:47 — with GitHub Actions Waiting
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 19, 2026

Unit Tests

    2 files  ± 0    421 suites  +5   6m 43s ⏱️ -8s
4 946 tests +68  4 944 ✅ +68  2 💤 ±0  0 ❌ ±0 
5 888 runs  +95  5 886 ✅ +95  2 💤 ±0  0 ❌ ±0 

Results for commit 6077359. ± Comparison against base commit 1ad24df.

♻️ This comment has been updated with latest results.

@sylvainsf sylvainsf changed the title Add GitHub workspace kind with PostgreSQL backup/restore Add filesystem based data store persistence Mar 19, 2026
Namespace: "default",
},
Recipes: recipePackOptions{
DevRecipes: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the functional test workflow skips dev recipes but this sets them to true always. might have missed it but does this get overwritten?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be used in the new workflow that is the end to end test for repo radius. I imagine we will extend that workflow with other end to end tests rather than shoehorn them into the existing functional tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — the functional-test-github-workspace.yaml workflow is a separate end-to-end test specific to the GitHub workspace lifecycle, not the existing functional test suite. Dev recipes are intentional here because the workflow tests a real Radius install with actual recipe execution. The flag difference is by design.

- Add LockInfo, ErrDeployLockHeld, NewLockInfoFromEnv to gitstate
- Add TryAcquireDeployLock / ReleaseDeployLock to StateWorktree
- Wire acquireDeployLock into rad deploy Runner (github kind only)
- Retry of the same workflow run (higher GITHUB_RUN_ATTEMPT) takes over a stale lock
- Unrelated workspace kinds and nil-field Runner are no-ops
- Full test coverage for all lock paths in gitstate and deploy

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 21:15 — with GitHub Actions Waiting
@sylvainsf sylvainsf changed the title Add filesystem based data store persistence feat(github-workspace): add GitHub Actions workspace kind with git-backed state persistence Mar 19, 2026
The helm unit tests were asserting against wrong data keys:
- ucp/configmaps.yaml: 'ucp.yaml' -> 'ucp-config.yaml'
- rp/configmaps.yaml: 'applications-rp.yaml' -> 'radius-self-host.yaml'
- dynamic-rp/configmaps.yaml: 'dynamic-rp.yaml' -> 'radius-self-host.yaml'

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 21:33 — with GitHub Actions Waiting
…map assertions

helm-unittest's 'contains' is for arrays; configmap data values are
multi-line strings. Switch to 'matchRegex' which works on strings.

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 21:41 — with GitHub Actions Waiting
- Remove duplicate copyright/package declarations in k3d_test.go and pgbackup_test.go
- Fix workflow permissions: contents: read -> contents: write (needed to push radius-state branch)
- Add dynamic_rp database to pgbackup, init-db ConfigMap, and dynamic-rp configmap URL
  (was incorrectly using 'ucp' user/database for dynamic-rp)
- Update pgbackup_test.go to expect 3 databases
- Document all three backed-up databases in github-workspace.md

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 21:47 — with GitHub Actions Waiting
- actions/checkout: 11bd719 (v4.2.2) -> de0fac2 (v6.0.2)
- actions/setup-go: 0aaccfd (v5.4.0, invalid SHA) -> 4b73464 (v6.3.0)
- actions/upload-artifact: ea165f8 (v4.6.2) -> bbbca2d (v7.0.0)

The invalid setup-go SHA caused 'Set up job' to fail immediately because
the runner could not download the action at that commit hash.

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 21:56 — with GitHub Actions Waiting
…setup-kubectl SHA

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 22:03 — with GitHub Actions Waiting
…nit call

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 19, 2026 22:11 — with GitHub Actions Waiting
…ctions)

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf temporarily deployed to external-contributor-approval March 19, 2026 22:22 — with GitHub Actions Inactive
@radius-functional-tests
Copy link
Copy Markdown

radius-functional-tests bot commented Mar 19, 2026

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref ae3a26e
Unique ID func74449e3c83
Image tag pr-func74449e3c83
  • gotestsum 1.13.0
  • KinD: v0.29.0
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func74449e3c83
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func74449e3c83
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-func74449e3c83
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func74449e3c83
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func74449e3c83
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting ucp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

Use tea.WithInput(strings.NewReader("")) instead of short-circuiting the
channel so that RunProgram is always called (test mocks and the goroutine
that drains the progress channel continue to function correctly).

Also tidy gitstate_test.go: drop the unused dir variable from
Test_TryAcquireDeployLock_ReadOnlyDir and use t.Cleanup for the
permission-restore.

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 22, 2026 08:02 — with GitHub Actions Waiting
Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>

# Conflicts:
#	pkg/cli/cmd/radinit/init.go
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 22, 2026 08:15 — with GitHub Actions Waiting
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 49.41634% with 260 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.06%. Comparing base (1ad24df) to head (6077359).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cli/pgbackup/pgbackup.go 6.00% 94 Missing ⚠️
pkg/cli/k3d/k3d.go 3.84% 50 Missing ⚠️
pkg/cli/cmd/radinit/github.go 37.09% 39 Missing ⚠️
pkg/cli/cmd/deploy/deploy.go 33.33% 16 Missing ⚠️
pkg/cli/workspaces/connection.go 40.00% 11 Missing and 4 partials ⚠️
pkg/cli/cmd/radinit/init.go 17.64% 10 Missing and 4 partials ⚠️
pkg/cli/cmd/shutdown/shutdown.go 82.19% 10 Missing and 3 partials ⚠️
pkg/cli/cmd/radinit/pgbackup_client.go 25.00% 6 Missing ⚠️
pkg/cli/gitstate/gitstate.go 95.58% 3 Missing and 3 partials ⚠️
...kg/components/database/databaseprovider/factory.go 0.00% 5 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11457      +/-   ##
==========================================
- Coverage   51.07%   51.06%   -0.01%     
==========================================
  Files         699      706       +7     
  Lines       44316    44821     +505     
==========================================
+ Hits        22634    22890     +256     
- Misses      19517    19743     +226     
- Partials     2165     2188      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Remove DefaultRecipePack from enterGitHubInitOptions: Radius.Core/recipePacks
  is a new API type added in this PR; the published Helm chart images used by
  the E2E test predate this addition and return 400 BadRequest on that endpoint.
  The GitHub workspace E2E only verifies cluster setup and PG backup/restore, so
  recipe packs are not needed.

- Override database.image to docker.io/library/postgres: the chart default
  'mirror/postgres' resolves to ghcr.io/radius-project/mirror/postgres (non-
  existent) when no global.imageRegistry is set in public CI.

- Fix import ordering in github_test.go (gofmt)

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf requested a deployment to external-contributor-approval March 22, 2026 08:57 — with GitHub Actions Waiting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants