Skip to content

Lease Proxy Client: Add many improvements#76538

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
danilo-gemoli:feat/step-registry/improve-lease-proxy-client
Mar 21, 2026
Merged

Lease Proxy Client: Add many improvements#76538
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
danilo-gemoli:feat/step-registry/improve-lease-proxy-client

Conversation

@danilo-gemoli
Copy link
Contributor

@danilo-gemoli danilo-gemoli commented Mar 19, 2026

This PR has been tailored specifically to handle the use use we have in #76238, the script is now more reliable, robust and secure.

I believe that providing few examples of how to use the lease__* functions defined would help the reviewers more than trying to describe the technicalities that have been introduced.

Use Case 1 - Acquire and release

Script:

lease_handle="$(lease__acquire --type=openshift-org-aws --count=2)"
printf 'Acquired leases: %s\n' "$(lease__cat --handle="$lease_handle" --format=csv)"
lease__release --handle="$lease_handle"

Output:

Acquired leases: openshift-org-aws--quota-slice-00,openshift-org--quota-slice-01
openshift-org-aws--quota-slice-00 openshift-org-aws--quota-slice-01 released

Description:
--scope=step means that the leases are scoped to the current step and must be released before it completes. This is the default behavior.

lease_handle is an opaque file handle that can be used to:

  • Print the lease names that have been acquired.
  • Release them.

Under the hood is just a temporary file, but here we provide some helper functions like lease__cat to deal with it without being aware of the implementation details.

Use Case 2 - Acquire in a step and release in a different one

Script:

# Step 1
lease_handle="$(lease__acquire --type=openshift-org-aws --count=2 --scope=test)"
printf 'Acquired leases: %s\n' "$(lease__cat --handle="$lease_handle" --format=csv)"

# Step 2
lease__release --scope=test

Output:

Acquired leases: openshift-org-aws--quota-slice-00,openshift-org-aws--quota-slice-01
openshift-org-aws--quota-slice-00 openshift-org-aws--quota-slice-01 released

Description:
--scope=test means that the leases are scoped to the entire life-cycle of a test, therefore they can be acquired from a step (as an example: ipi-install-install) and released later on, possibly in post steps.

Use Case 3 - Deferred acquire and release

Script:

lease_handle="$(lease__acquire --type=openshift-org-aws --count=2 --jitter=15m)"
printf 'Acquired leases: %s\n' "$(lease__cat --handle="$lease_handle" --format=csv)"
lease__release --delay=20m --handle="$lease_handle"

Output:

# `--jitter=15m` waits up to 15m (random value in the range [0, 15m]) before acquiring the leases
Acquired leases: openshift-org-aws--quota-slice-00,openshift-org-aws--quota-slice-01

# `--delay=20m` pauses for 20m before releasing them
openshift-org-aws--quota-slice-00 openshift-org-aws--quota-slice-01 released

Description:
--jitter and --delay pauses the execution for the specified amount of time before acquiring and/or releasing.

Use Case 4 - Refresh install leases

Script:

install_lease_handle=''
trap "lease__release --handle=\"$install_lease_handle\"" EXIT TERM INT

function refresh_install_lease() {
    if ! lease__install_lease_eligible; then
        return 0
    fi

    lease__release --handle="$install_lease_handle"
    install_lease_handle=$(lease__acquire --type="openshift-org-aws" --scope=step)
    printf 'install lease acquired: %s\n' "$(lease__cat --handle="$install_lease_handle" --format=csv)"
    lease__release --delay=20m --handle="$install_lease_handle" &
}

max=3
retries=0
while [[ $retries -lt $max ]]; do
    refresh_install_lease || true

    if openshift-install create cluster; then
        echo 'The cluster has been created'
        break
    fi

    ((++$retries))
done

Output:

Acquired leases: openshift-org-aws--install-quota-slice-00
# After 20m
openshift-org-aws--install-quota-slice-00 released

Description:
lease__install_lease_eligible checks whether a test using a CLUSTER_PROFILE_SET is eligible for acquiring an install lease. So far only the cluster profile sets that match openshift-org-* are eligible: see #76068 and #76340.

The following lines refresh a lease:

lease__release --handle="$install_lease_handle"
install_lease_handle=$(lease__acquire --type="openshift-org-aws" --scope=step)
lease__release --delay=20m --handle="$install_lease_handle" &

The lease__release is idempotent and thread safe, therefore the following facts hold:

  • Calling lease__release --handle="$install_lease_handle" several times is safe: leases are released only once.
  • Concurrent executions of lease__release --handle="$install_lease_handle" are safe.

/cc @jupierce @stbenjam

@openshift-ci openshift-ci bot requested review from jupierce and stbenjam March 19, 2026 11:49
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2026
@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Mar 19, 2026
@danilo-gemoli danilo-gemoli force-pushed the feat/step-registry/improve-lease-proxy-client branch from 8bd204d to eb18803 Compare March 19, 2026 11:56
fi

local ec=0
local response=$(curl --no-progress-meter -X POST -o "$response_body" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a retry mechanism? If Boskos has a brief outage, I wouldn't want to skip the lease or error out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have --retry 5 --retry-delay 10 --retry-all-errors now. Transient errors like CONN_REFUSED or HTTP 500 force curl to retry, whereas errors like HTTP 404 do not. Those are treated like non-transient by curl.

fi
}

local response="$(curl --connect-timeout 300 --max-time 600 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about a retry, although hopefully we're just using images with jq already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines +173 to +186
local jitter_value='0'
local jitter_unit=''
if [[ $jitter_defined -eq 1 ]]; then
if [[ "$jitter" =~ ^([1-9][[:digit:]]*)(m|s)$ ]]; then
jitter_value="${BASH_REMATCH[1]}"
jitter_unit="${BASH_REMATCH[2]}"
if [[ "$jitter_unit" == "m" ]]; then
jitter_value=$(( jitter_value * 60 ))
fi
else
printf "jitter parameter is invalid\n"
return 1
fi
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is great

@danilo-gemoli danilo-gemoli force-pushed the feat/step-registry/improve-lease-proxy-client branch from eb18803 to f3137a0 Compare March 20, 2026 15:11
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@danilo-gemoli: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danilo-gemoli, droslean

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [danilo-gemoli,droslean]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@danilo-gemoli
Copy link
Contributor Author

/retest-required

1 similar comment
@danilo-gemoli
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 21, 2026

@danilo-gemoli: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 5705b51 into openshift:main Mar 21, 2026
5 checks passed
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 21, 2026

@danilo-gemoli: Updated the following 13 configmaps:

  • lease-proxy configmap in namespace ci at cluster build05 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build06 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build09 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build11 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build03 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster vsphere02 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build04 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build07 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build10 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build01 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build02 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster build08 using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
  • lease-proxy configmap in namespace ci at cluster app.ci using the following files:
    • key client.sh using file ci-operator/lease/proxy-client.sh
Details

In response to this:

This PR has been tailored specifically to handle the use use we have in #76238, the script is now more reliable, robust and secure.

I believe that providing few examples of how to use the lease__* functions defined would help the reviewers more than trying to describe the technicalities that have been introduced.

Use Case 1 - Acquire and release

Script:

lease_handle="$(lease__acquire --type=openshift-org-aws --count=2)"
printf 'Acquired leases: %s\n' "$(lease__cat --handle="$lease_handle" --format=csv)"
lease__release --handle="$lease_handle"

Output:

Acquired leases: openshift-org-aws--quota-slice-00,openshift-org--quota-slice-01
openshift-org-aws--quota-slice-00 openshift-org-aws--quota-slice-01 released

Description:
--scope=step means that the leases are scoped to the current step and must be released before it completes. This is the default behavior.

lease_handle is an opaque file handle that can be used to:

  • Print the lease names that have been acquired.
  • Release them.

Under the hood is just a temporary file, but here we provide some helper functions like lease__cat to deal with it without being aware of the implementation details.

Use Case 2 - Acquire in a step and release in a different one

Script:

# Step 1
lease_handle="$(lease__acquire --type=openshift-org-aws --count=2 --scope=test)"
printf 'Acquired leases: %s\n' "$(lease__cat --handle="$lease_handle" --format=csv)"

# Step 2
lease__release --scope=test

Output:

Acquired leases: openshift-org-aws--quota-slice-00,openshift-org-aws--quota-slice-01
openshift-org-aws--quota-slice-00 openshift-org-aws--quota-slice-01 released

Description:
--scope=test means that the leases are scoped to the entire life-cycle of a test, therefore they can be acquired from a step (as an example: ipi-install-install) and released later on, possibly in post steps.

Use Case 3 - Deferred acquire and release

Script:

lease_handle="$(lease__acquire --type=openshift-org-aws --count=2 --jitter=15m)"
printf 'Acquired leases: %s\n' "$(lease__cat --handle="$lease_handle" --format=csv)"
lease__release --delay=20m --handle="$lease_handle"

Output:

# `--jitter=15m` waits up to 15m (random value in the range [0, 15m]) before acquiring the leases
Acquired leases: openshift-org-aws--quota-slice-00,openshift-org-aws--quota-slice-01

# `--delay=20m` pauses for 20m before releasing them
openshift-org-aws--quota-slice-00 openshift-org-aws--quota-slice-01 released

Description:
--jitter and --delay pauses the execution for the specified amount of time before acquiring and/or releasing.

Use Case 4 - Refresh install leases

Script:

install_lease_handle=''
trap "lease__release --handle=\"$install_lease_handle\"" EXIT TERM INT

function refresh_install_lease() {
   if ! lease__install_lease_eligible; then
       return 0
   fi

   lease__release --handle="$install_lease_handle"
   install_lease_handle=$(lease__acquire --type="openshift-org-aws" --scope=step)
   printf 'install lease acquired: %s\n' "$(lease__cat --handle="$install_lease_handle" --format=csv)"
   lease__release --delay=20m --handle="$install_lease_handle" &
}

max=3
retries=0
while [[ $retries -lt $max ]]; do
   refresh_install_lease || true

   if openshift-install create cluster; then
       echo 'The cluster has been created'
       break
   fi

   ((++$retries))
done

Output:

Acquired leases: openshift-org-aws--install-quota-slice-00
# After 20m
openshift-org-aws--install-quota-slice-00 released

Description:
lease__install_lease_eligible checks whether a test using a CLUSTER_PROFILE_SET is eligible for acquiring an install lease. So far only the cluster profile sets that match openshift-org-* are eligible: see #76068 and #76340.

The following lines refresh a lease:

lease__release --handle="$install_lease_handle"
install_lease_handle=$(lease__acquire --type="openshift-org-aws" --scope=step)
lease__release --delay=20m --handle="$install_lease_handle" &

The lease__release is idempotent and thread safe, therefore the following facts hold:

  • Calling lease__release --handle="$install_lease_handle" several times is safe: leases are released only once.
  • Concurrent executions of lease__release --handle="$install_lease_handle" are safe.

/cc @jupierce @stbenjam

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants