HYPERFLEET-560 - doc: add adapter deletion flow design#110
HYPERFLEET-560 - doc: add adapter deletion flow design#110xueli181114 wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a draft design specifying a two‑phase deletion workflow for Clusters and NodePools: Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant API as API/Controller
participant Sentinel
participant Executor as Adapter Framework (Executor)
participant Adapter as Adapter/Cloud
participant DB as Database
User->>API: DELETE /clusters/{id} (set deletion_timestamp, phase=Terminating) / returns 202
API->>Sentinel: emit reconcile CloudEvent
Sentinel->>Executor: poll & deliver reconcile
Executor->>Adapter: evaluate top-level when_deleting CEL -> enter deletion mode
Executor->>Adapter: compute deletion order from deleteOptions.after
loop for each resource in topological order
Executor->>Adapter: request delete (respect propagationPolicy)
Adapter->>Executor: update status (Applied=False / Health=True / observed_time)
end
Executor->>Sentinel: publish status updates
Sentinel->>API: reconcile observes delete signal (Applied=False && Health=True && observed_time >= deletion_timestamp)
API->>DB: garbage-collect NodePool records when nodepool-level signals met
API->>DB: garbage-collect Cluster record when all adapters & nodepools are deleted
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md (1)
241-242: Deletion ordering needs explicit handling for invalid dependency graphs.Topological sorting is mentioned, but behavior for cycles or unknown
afterreferences is undefined. Add explicit fail-fast semantics (status/error reason) so misconfigurations don’t cause silent stuck deletions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` around lines 241 - 242, The deletion executor's topological sorting using deleteOption.after must validate the dependency graph up-front and fail fast on invalid graphs: add a validation step in the executor (the component that builds the topological order) that detects unknown `after` references and cycles before any deletions, and on detection immediately mark the affected resources' deletion status with an explicit error/reason (e.g., "invalid dependency: missing reference X" or "cycle detected in deleteOption.after"), log the failure, and abort execution rather than proceeding or hanging; ensure the validation surfaces the offending resource identifiers so operators can fix misconfigurations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Around line 107-109: The deletion gate is inconsistent: change the simple "all
adapters report Applied=False" rule to the canonical, stricter gate used later
so implementations match; update the DELETE flow description (the step that
currently says "when all adapters report Applied=False") to state that the API
deletes the cluster only when every adapter status for that cluster has
Applied=False AND Health=True and the adapter's observed_time is >= the
cluster's deletion_timestamp, and ensure the terms deletion_timestamp,
status.phase (Terminating), Applied, Health, and observed_time are referenced
exactly as elsewhere in the doc so both sections read the same.
- Around line 507-509: Update the deletion criteria in the "API deletes
nodepools" and "API deletes cluster" bullets so they match the API delete-signal
safety gate: require adapters to report Applied=False AND Health=True, and
verify the adapter's post-delete timestamp is fresh (i.e., within the configured
grace window) before deleting records; change the two lines that currently
reference only `Applied=False` to include `Health=True` and the post-delete
timestamp freshness check so nodepool deletion (the nodepool-level adapter
condition) and cluster deletion (all adapters including cluster-level) follow
the same safe gating logic.
- Around line 222-223: Update the documentation to correct the `Orphan`
propagation description so it matches Kubernetes `metav1.DeletionPropagation`
semantics: replace the diagram node text `K[Skip - leave resource as-is]` and
the explanatory sentence at the other occurrence with wording that `Orphan`
deletes the owner resource while preserving dependents (dependents become
orphaned), not "skip/do nothing" or "leave resource as-is"; ensure the term
`Orphan` and `metav1.DeletionPropagation` are used to make the intended
semantics clear.
---
Nitpick comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Around line 241-242: The deletion executor's topological sorting using
deleteOption.after must validate the dependency graph up-front and fail fast on
invalid graphs: add a validation step in the executor (the component that builds
the topological order) that detects unknown `after` references and cycles before
any deletions, and on detection immediately mark the affected resources'
deletion status with an explicit error/reason (e.g., "invalid dependency:
missing reference X" or "cycle detected in deleteOption.after"), log the
failure, and abort execution rather than proceeding or hanging; ensure the
validation surfaces the offending resource identifiers so operators can fix
misconfigurations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ae6754ea-c2cc-40a0-9185-b6ceca336446
📒 Files selected for processing (1)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
Outdated
Show resolved
Hide resolved
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
Outdated
Show resolved
Hide resolved
b78790c to
c72f78b
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md (2)
124-124:⚠️ Potential issue | 🟠 MajorUnify the final-delete gate everywhere to the same canonical criteria
Line 124, Lines 364-373, and Lines 528-529 weaken or partially define the gate compared with the stricter sections. Please use one exact rule everywhere:
deletion_timestampset,Applied=False,Health=True, andobserved_time >= deletion_timestamp(per adapter).Suggested doc edit
-2. **Delete**: The API monitors adapter statuses. When **all** adapters report `Applied=False` for a cluster that has `deletion_timestamp` set, the API deletes the cluster record and all associated data (nodepools, adapter statuses). +2. **Delete**: The API monitors adapter statuses. It deletes only when, for all required adapters, `Applied=False` AND `Health=True` AND `observed_time >= deletion_timestamp` for a cluster with `deletion_timestamp` set. Then it deletes the cluster record and all associated data (nodepools, adapter statuses). -The API determines when to delete by checking three things: +The API determines when to delete by checking four things: ... -3. **All adapters report `Health=True`** (confirms the adapter status is reliable) +3. **All adapters report `Health=True`** (confirms the adapter status is reliable) +4. **All adapters have `observed_time >= deletion_timestamp`** (confirms statuses are post-delete and not stale) -6. API deletes nodepools when their adapters report `Applied=False` -7. API deletes cluster when all adapters (cluster + nodepool level) report `Applied=False` +6. API deletes nodepools when required nodepool adapters report `Applied=False AND Health=True` and `observed_time >= nodepool.deletion_timestamp` +7. API deletes cluster when all required adapters (cluster + nodepool level) report `Applied=False AND Health=True` and `observed_time >= cluster.deletion_timestamp`Also applies to: 364-373, 528-529
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` at line 124, The final-delete gate is inconsistent across the document; unify all checks to a single canonical rule: require the cluster's deletion_timestamp to be set and for each adapter require Applied=False, Health=True, and observed_time >= deletion_timestamp; update every occurrence that currently weakens or partially defines the gate so it uses this exact criteria (use the terms deletion_timestamp, Applied=False, Health=True, observed_time >= deletion_timestamp verbatim to ensure consistency).
102-103:⚠️ Potential issue | 🟠 Major
Orphanpropagation is documented with incorrect behaviorLines 238/251 (and downstream examples at Line 102 and Line 294) describe
Orphanas “skip/do nothing,” which conflicts with Kubernetesmetav1.DeletionPropagationsemantics.Orphandeletes the owner resource and preserves dependents (orphaned).Suggested doc edit
- D -->|Orphan| K[Skip - leave resource as-is] + D -->|Orphan| K[Delete owner; preserve dependents (orphan)] -| `Orphan` | Do nothing, leave resource as-is | Resources managed by other systems or intentionally preserved | +| `Orphan` | Delete owner resource and preserve dependents (dependents become orphaned). | Preserve child resources while removing the owner | - Note over Adapter: serviceAccount: Orphan, skip + Note over Adapter: serviceAccount: Orphan (delete owner, preserve dependents) - D3 -->|Orphan| D6[Skip] + D3 -->|Orphan| D6[Delete owner; preserve dependents]Also applies to: 238-253, 294-295
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` around lines 102 - 103, The doc incorrectly describes the DeletionPropagation value "Orphan" as "skip/do nothing"; update all occurrences of "Orphan" in adapter-deletion-flow-design.md (including the examples mentioning serviceAccount: Orphan and the sections referencing metav1.DeletionPropagation) to state that "Orphan" deletes the owner resource while preserving dependents (i.e., orphaning them), and adjust any example text or diagrams that imply "skip" behavior so they reflect the correct semantics of deleting owner + preserving dependents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Line 124: The final-delete gate is inconsistent across the document; unify all
checks to a single canonical rule: require the cluster's deletion_timestamp to
be set and for each adapter require Applied=False, Health=True, and
observed_time >= deletion_timestamp; update every occurrence that currently
weakens or partially defines the gate so it uses this exact criteria (use the
terms deletion_timestamp, Applied=False, Health=True, observed_time >=
deletion_timestamp verbatim to ensure consistency).
- Around line 102-103: The doc incorrectly describes the DeletionPropagation
value "Orphan" as "skip/do nothing"; update all occurrences of "Orphan" in
adapter-deletion-flow-design.md (including the examples mentioning
serviceAccount: Orphan and the sections referencing metav1.DeletionPropagation)
to state that "Orphan" deletes the owner resource while preserving dependents
(i.e., orphaning them), and adjust any example text or diagrams that imply
"skip" behavior so they reflect the correct semantics of deleting owner +
preserving dependents.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fa20e0fd-bc6c-4c19-a424-fc99f8673441
📒 Files selected for processing (1)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
87629c6 to
1bc2c55
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md (2)
374-375:⚠️ Potential issue | 🟠 MajorDeletion gate criteria are inconsistent across sections.
Line 374 weakens the rule to
Applied=False+deletion_timestamp, and the timeline checks at Line 114/Line 117 and Line 590/Line 593 omitobserved_time >= deletion_timestamp. The canonical gate elsewhere is stricter (Applied=False AND Health=True AND observed_time >= deletion_timestamp). Keep one rule everywhere to prevent premature deletion from stale statuses.Also applies to: 114-117, 590-594
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` around lines 374 - 375, The deletion-gate is inconsistent: replace the weaker `Applied=False` + `deletion_timestamp` variant and the timeline checks that omit the timestamp with the canonical rule everywhere; require `Applied=False AND Health=True AND observed_time >= deletion_timestamp` in all places that define or evaluate deletion eligibility (the paragraph that currently uses `Applied=False` + `deletion_timestamp`, the timeline checks, and the canonical gate text) so deletion is only allowed when the resource is marked not applied, healthy, and the observed status is at-or-after the cluster deletion timestamp.
248-249:⚠️ Potential issue | 🟠 Major
Orphanpropagation is still defined incorrectly.Line 248/Line 261 describe
Orphanas skip/do-nothing, which conflicts with Kubernetesmetav1.DeletionPropagation(owner is deleted, dependents are preserved/orphaned). This is a behavioral mismatch in the DSL contract.Suggested doc fix
- D -->|Orphan| K[Skip - leave resource as-is] + D -->|Orphan| K[Delete owner, preserve dependents (orphan)] -| `Orphan` | Do nothing, leave resource as-is | Resources managed by other systems or intentionally preserved | +| `Orphan` | Delete the owner resource while preserving dependents (they become orphaned). | Preserve dependent resources while removing the owner |Also applies to: 261-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` around lines 248 - 249, Update the DSL doc and diagram so the "Orphan" propagation matches Kubernetes metav1.DeletionPropagation semantics: change any description that says "skip/do-nothing" or "leave resource as-is" (e.g., the diagram node labeled K[Skip - leave resource as-is] and the prose around the 'Orphan' keyword) to state that the owner is deleted but dependents are preserved/orphaned (dependents' ownerReferences are removed/left intact per K8s behavior); ensure all mentions of Orphan (including the block currently labeled "Orphan" at the diagram and the text blocks explaining propagation) reflect "owner deleted, dependents preserved/orphaned" rather than "no action."
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Around line 374-375: The deletion-gate is inconsistent: replace the weaker
`Applied=False` + `deletion_timestamp` variant and the timeline checks that omit
the timestamp with the canonical rule everywhere; require `Applied=False AND
Health=True AND observed_time >= deletion_timestamp` in all places that define
or evaluate deletion eligibility (the paragraph that currently uses
`Applied=False` + `deletion_timestamp`, the timeline checks, and the canonical
gate text) so deletion is only allowed when the resource is marked not applied,
healthy, and the observed status is at-or-after the cluster deletion timestamp.
- Around line 248-249: Update the DSL doc and diagram so the "Orphan"
propagation matches Kubernetes metav1.DeletionPropagation semantics: change any
description that says "skip/do-nothing" or "leave resource as-is" (e.g., the
diagram node labeled K[Skip - leave resource as-is] and the prose around the
'Orphan' keyword) to state that the owner is deleted but dependents are
preserved/orphaned (dependents' ownerReferences are removed/left intact per K8s
behavior); ensure all mentions of Orphan (including the block currently labeled
"Orphan" at the diagram and the text blocks explaining propagation) reflect
"owner deleted, dependents preserved/orphaned" rather than "no action."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5a70992-699a-4bbd-a1a9-21476bcd3aaf
📒 Files selected for processing (1)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md (2)
114-119:⚠️ Potential issue | 🟠 MajorDeletion gate is still weaker in diagrams than the canonical API signal.
Line 114 and Line 117 (and again Line 596 and Line 599) only gate on
Applied=False + Health=True, but the doc’s safety rule also requiresobserved_time >= deletion_timestamp. This inconsistency can drive premature deletion from stale statuses.Please align these diagram/flow steps with the canonical gate used elsewhere.
Also applies to: 596-600
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` around lines 114 - 119, The deletion gates in the sequence diagram currently check only "Applied=False AND Health=True"; update those gate conditions (the lines showing "Applied=False AND Health=True?") to include the canonical temporal check so they read "Applied=False AND Health=True AND observed_time >= deletion_timestamp?" and apply this change to both occurrences mentioned (the cluster/nodepool gate at lines ~114-119 and the duplicate at ~596-600) so the diagram matches the API's safety rule.
248-249:⚠️ Potential issue | 🟠 Major
Orphanpropagation is documented with incorrect Kubernetes semantics.Line 248 and Line 261 describe
Orphanas “skip/do nothing,” but Kubernetesmetav1.DeletionPropagation=Orphanmeans delete the owner while preserving dependents (which become orphaned). This also conflicts with Line 263’s “matches metav1” statement.Also applies to: 261-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md` around lines 248 - 249, The diagram and text incorrectly state that "Orphan" means "skip/do nothing"; update all occurrences of the "Orphan" propagation description (the diagram node labeled "Orphan" / text that says "Skip - leave resource as-is" and the statement that "matches metav1") to accurately reflect Kubernetes semantics: metav1.DeletionPropagation=Orphan causes deletion of the owner while preserving dependents (dependents are left orphaned), not skipping deletion; ensure consistency across the related descriptions that currently mention lines 248, 261–263 so the diagram and the prose explicitly say "delete owner, preserve dependents (dependents become orphaned) — aligns with metav1.DeletionPropagation=Orphan."
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Around line 514-516: The example Applied condition uses invoke_deletion ?
(has(resources.clusterJob) ? "True" : "False") :
(has(resources.clusterConfigMap.metadata.resourceVersion) ? "True" : "False"),
which can mark deletion complete prematurely; update the predicate so that in
deletion mode it checks that all managed resources are absent (e.g., verify
!has(resources.clusterJob) AND !has(resources.<otherManagedResource>) etc.,
excluding only explicit Orphan resources) rather than only the absence of
resources.clusterJob, and mirror the same comprehensive absence check used by
your deletion-complete definition for consistency.
---
Duplicate comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Around line 114-119: The deletion gates in the sequence diagram currently
check only "Applied=False AND Health=True"; update those gate conditions (the
lines showing "Applied=False AND Health=True?") to include the canonical
temporal check so they read "Applied=False AND Health=True AND observed_time >=
deletion_timestamp?" and apply this change to both occurrences mentioned (the
cluster/nodepool gate at lines ~114-119 and the duplicate at ~596-600) so the
diagram matches the API's safety rule.
- Around line 248-249: The diagram and text incorrectly state that "Orphan"
means "skip/do nothing"; update all occurrences of the "Orphan" propagation
description (the diagram node labeled "Orphan" / text that says "Skip - leave
resource as-is" and the statement that "matches metav1") to accurately reflect
Kubernetes semantics: metav1.DeletionPropagation=Orphan causes deletion of the
owner while preserving dependents (dependents are left orphaned), not skipping
deletion; ensure consistency across the related descriptions that currently
mention lines 248, 261–263 so the diagram and the prose explicitly say "delete
owner, preserve dependents (dependents become orphaned) — aligns with
metav1.DeletionPropagation=Orphan."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: db0d6340-e993-41a7-8a35-304ae0699d80
📒 Files selected for processing (1)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
Outdated
Show resolved
Hide resolved
1bc2c55 to
3c96851
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md`:
- Around line 263-264: The doc's "Orphan" semantics are inconsistent with
Kubernetes: decide whether the adapter's "Orphan" means skip deletion (rename
node and text to "Skip" or "Preserve") or actually perform deletion using
Kubernetes semantics (call Delete/propagationPolicy: Orphan so owner is deleted
and dependents keep cleared ownerReferences); then update the diagram node D
-->|Orphan| K[Skip - leave resource as-is], the surrounding text that references
metav1.DeletionPropagation and lines mentioning "do nothing", and any examples
to reflect the chosen behavior (if you choose to align with K8s also mention
propagationPolicy: Orphan and ownerReferences clearing; if you choose skip,
rename symbols from Orphan to Skip/Preserve and state it does not call
Kubernetes delete).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 673c0856-8483-40e4-bd43-8591d8ef185b
📒 Files selected for processing (1)
hyperfleet/components/adapter/framework/adapter-deletion-flow-design.md
04c7af2 to
85f615e
Compare
|
|
||
| ### Overview | ||
|
|
||
| The deletion flow extends the existing adapter framework with minimal DSL changes. When the HyperFleet API receives a DELETE request, it soft-deletes the cluster (sets `deletion_timestamp`), and the cluster enters the `Terminating` phase. The existing Sentinel polling and event-driven adapter workflow handles the rest. |
There was a problem hiding this comment.
Since we have created_at, deleted_at, should we have deleted_at instead of deletion_time ?
There was a problem hiding this comment.
+1, lets keep them aligned here 🙏
There was a problem hiding this comment.
deletion_timestamp is k8s style, but if we would like deleted_at I am OK with that. But for me, deleted_at seems like the deletion finished timestamp.
There was a problem hiding this comment.
As I understand it, deleted_at is when the user deleted the API resource, so in fact is the deletion finished timestamp for the user.
Once a user has deleted an API resource, they should be consider it "gone", "kaput", "dead", "over"
The reality will be reconciled later, and that will come in the status.conditions
There was a problem hiding this comment.
Agreed. Renamed deletion_timestamp → deleted_at throughout the doc to match created_at convention.
|
|
||
| ## What & Why | ||
|
|
||
| **What**: Design the deletion workflow for Clusters and NodePools, defining how the adapter framework handles resource cleanup when a cluster enters the `Terminating` phase. |
There was a problem hiding this comment.
We don't have phases in the API
I would simply say "when a cluster is soft deleted" cluster, or "API resource" if we want to talk in a more generic way from now on
There was a problem hiding this comment.
I agree with API Resource here, I was going to call out below. I think we need to speak more generic terms and move away for cluster and node pools. We should be focused on API resources parent and children
There was a problem hiding this comment.
This is a legacy. Let me update the word to avoid confusion.
There was a problem hiding this comment.
Agreed. Removed all status.phase / "cluster phase" / "Terminating phase" references. Now uses "soft-deleted" / "cluster state" / deleted_at throughout.
| 1. **Mark for Deletion**: On `DELETE /clusters/{id}`, the API sets `deletion_timestamp` on the cluster record **and all its nodepools**. All records remain in the database and are still queryable. | ||
| - `deletion_timestamp` is the canonical delete intent signal for orchestration and adapter behavior. | ||
| - Customer-facing lifecycle state (for example, `Terminating`) is derived from `deletion_timestamp` at read time. | ||
| - API should set aggregate `Ready=False` (reason `Deleting`) immediately when `deletion_timestamp` is set so Sentinel's existing not-ready query picks up the resource without Sentinel code changes. |
There was a problem hiding this comment.
This is interesting.
Until now, Ready changed because of:
specchange -> increasesgeneration- required adapters reporting
Availablefalse
So, the Ready condition was a computed value out of the generation + required adapters conditions
Open questions:
- Does deletion affect
generation? - Do we have to include a new condition for computing
Ready?- e.g.
deleted_atis set
- e.g.
- What happens if required adapters report
Trueafter deletion?Readycan not becomeTruebecause of adapters not managing the deletion
- Should be there "required adapters" for the deletion?
Should be there "required adapters" for the deletion?
An idea (hack?): to have a required adapter for deletion.
- When a cluster has not
deleted_at, it will returnAvailable=True- Meaning, that is ok,
Readycan become true and this adapter has "nothing to say there"
- Meaning, that is ok,
- When a cluster has
deleted_at- It also updates
generation, soReady=False - A message will be triggered
- This adapter has work to do
- It will report
Available=Falsewhile it is doing - The resource, will keep
Ready=Falseand triggering - Until deletion adapter finally says
Available=True
- It also updates
- The confusing part....
- Other required adapters, should return
Available=Truewhen cluster is deleted.... - In order to get
Readyfor the new generation to becomeTrue
- Other required adapters, should return
There was a problem hiding this comment.
Some great points here @rh-amarin
Should generation increment? I think yes, generation is capturing a new user intent on a resource. It should increment
My take, required adapters will go away in a future version. I think we need more then required adapters to determine top level state we will need to improve this aggregation.
To that end I think we need a new condition for deletion, I think anything else will be a pure hack. Current conditions are directly tied to resource creation. Making changes here will get messy over time. For example does Applied == False mean not applied or deleted. Lets take some time tomorrow to white board out a solution here.
There was a problem hiding this comment.
To that end I think we need a new condition for deletion, I think anything else will be a pure hack
Well, if you think of a deletion just being a desired state... the whole point of the api-sentinel-adapter is to reconcile with that desired state.
For me, a deleted API resource with Ready==True means that the "real world" has been reconciled with the desired state.
But Ready is really a misleading name here.... Reconciled could be an option
There was a problem hiding this comment.
Yeap. I agree that Ready is leading to misunderstanding.
There was a problem hiding this comment.
Does deletion affect generation ?
I doubt this. If generation changes that means we need to apply the resource, then delete the resource. What's the meaning here? From user perspective, generation should only means updating happen.
Do we have to include a new condition for computing Ready
Actually once the deletion is triggered, the resources shouldn't be Available anymore. I don't think we need to introduce deleted_at to calculate the Ready status. But I am not sure if sentinel can sending event once deleted_at set.
What happens if required adapters report True after deletion
I don't get it. if required adapters still reports true that is either a bug in adapter either deletion is not invoked.
Should be there "required adapters" for the deletion?
Actually I thought about it and I don't think we should have this terminology for deletion. All adapters should report applied false to avoid resources orphan. But there is one reason to introduce this is, some out of date adapters those have been deprecated and won't report statuses anymore their status kept Applied=True,Availabel=True,Health=True stale statuses.
does Applied == False mean not applied or deleted
It means resource not exist no matter not applied or deleted. If there is deleted_at set, that means deleted otherwise it means not created.
Until deletion adapter finally says Available=True
I will against this, Available means resources are ready I don't like deletion finishing is marked as Available.
I would like to follow k8s conventions
There was a problem hiding this comment.
I doubt this. If generation changes that means we need to apply the resource, then delete the resource. What's the meaning here? From user perspective, generation should only means updating happen.
Is the spec been modified as part of the deletion?
There was a problem hiding this comment.
I don't get it. if required adapters still reports true that is either a bug in adapter either deletion is not invoked.
Some might, for example some race conditions of tasks that started running before the deletion was kicked off
There was a problem hiding this comment.
All adapters should report applied false to avoid resources orphan
This become complex, does applied false mean not applied or removed 🤔 will similar they are semantically different
There was a problem hiding this comment.
IMO, similar to how Ready is a bad word for the aggregated status (since it means "reconciled"), we are misusing Available in the adapter reports.
Available=True in an adapter means that "adapter has done its work correctly" and it may or not have relation with k8s being ready
An example of an adapter running a job...
- The adapter says
Available=Truemeaning the job has run correctly - But there is nothing really "available" there.
- It just means from the point of view of the adapter, the state of the world has been reconciled with the API resource expectation
Does deletion affect generation ?
Yesterday @ciaranRoche said to me:generationchanges with the "intention of the user"
So, under that insight, a deletion of an API resource is an intended change to the API resource, so it should updategeneration
I am not sure if sentinel can sending event once deleted_at set.
idea: the API shows soft_deleted API resources as just regular API resources
- It is up to GCP/ROSA team what to do with these
- They can expose or not to the final users
- They can trigger a cleaning operation
- A soft_deleted API resource with
Ready==Falsewould mean that the world still contains real resources- Sentinel will be still able to see these, as they are as normal API resources as any other one
|
|
||
| The API implements a two-phase deletion pattern: | ||
|
|
||
| 1. **Mark for Deletion**: On `DELETE /clusters/{id}`, the API sets `deletion_timestamp` on the cluster record **and all its nodepools**. All records remain in the database and are still queryable. |
There was a problem hiding this comment.
In our officce ours meeting, Jaime mentioned that they will like for us to keep the soft deletion, and provide a way for cleaning this triggered by them.
This way, they have time to research if anything goes wrong with some deleted clusters and the records will be still there.
There was a problem hiding this comment.
Can we define what is mean from soft delete. This doc having this mark for deletion is different to a soft delete, where a resource remains in the DB but not presented in the API
There was a problem hiding this comment.
If we offer a way to delete API resources maybe we don't need to hide soft_deleted API resources.
In fact, Sentinel needs to fetch API resources that are soft_deleted but Ready==False until they transition to True
(And then, Sentinel should not get soft_deleted API resources that are Ready==True older than 30m )
There was a problem hiding this comment.
From my perspective,
When delete is triggered by customer, API shouldn't delete anything it just need to set deleted_at until the condition match deletion condition and API set deleted column to value 1 or true to keep this data in DB but list/get cluster should only target to deleted=0 clusters.
There was a problem hiding this comment.
I think we need to decide on this whole soft_delete, we are pretty focused on it, what are the alternatives to it?
I would question if the juice is worth the squeeze, we had a deleted_clusters table in CS and I can't remember it ever been used. We mentioned soft_delete on the office hours and Jaime was like, oh that would be nice 😆
We def need to discuss this more IMO, to ensure we do not have tunnel vision on it
| - `deletion_timestamp` is the canonical delete intent signal for orchestration and adapter behavior. | ||
| - Customer-facing lifecycle state (for example, `Terminating`) is derived from `deletion_timestamp` at read time. | ||
| - API should set aggregate `Ready=False` (reason `Deleting`) immediately when `deletion_timestamp` is set so Sentinel's existing not-ready query picks up the resource without Sentinel code changes. | ||
| 2. **Delete (Hierarchical)**: The API uses layered deletion: |
There was a problem hiding this comment.
Why not marking the dependent resources (nodepools) also with deleted_at ?
The asynchronous process can happen elsewhere, but from the API point of view, the desired state is already "deleted" for all resources
There was a problem hiding this comment.
It indeed follows this way, is current sentence causing confusion? Let me update it.
| - Customer-facing lifecycle state (for example, `Terminating`) is derived from `deletion_timestamp` at read time. | ||
| - API should set aggregate `Ready=False` (reason `Deleting`) immediately when `deletion_timestamp` is set so Sentinel's existing not-ready query picks up the resource without Sentinel code changes. | ||
| 2. **Delete (Hierarchical)**: The API uses layered deletion: | ||
| - **Nodepool level**: Each nodepool record is deleted when all its nodepool-level adapters report `Applied=False` + `Health=True` with `observed_time >= deletion_timestamp` |
There was a problem hiding this comment.
Until now, only Available was used to compute Ready
... and it was hard enough
I think this will make it very complex to reason about it, and we should try to stick with using Available if possible
There was a problem hiding this comment.
I don't think so. Available=False means the CR is not ready for using. Only Applied=False means the resource is gone.
There was a problem hiding this comment.
IMO Available means something beyond applying a resource.
To me it means that the work to be done by the adapter is not considered yet to be done.
And, the adapter will keep receiving messages until it answers with Available=True
Is "the work to be done by the adapter" creating a CR? maybe yes, but not only that... An adapter could create a job that evaluates something and has to be re-executed until getting a successful result. Say for example a validation job.
When the job is created, Applied=False, the resource is there, but Available doesn't become True until success
| ### Sentinel Behavior | ||
|
|
||
| No changes are required to Sentinel **code logic or CloudEvent format**. The Sentinel: | ||
| - Polls the API and evaluates `message_decision` CEL expressions as usual |
There was a problem hiding this comment.
Right now the API returns the resources with:
Ready==FalseReady==True&&last_updated_timeolder than 30m
But for clusters that are deleted, it doesn't make sense to retrieve them 30m after the are Ready, if that means that they are effectively deleted....
unless we want to have zombie clusters, of course
There was a problem hiding this comment.
That's why in API side once deleted_at is set, the status needs to be set up Ready=False. But if API doesn't do the change then sentinel should do that.
| - Does not distinguish between creation and deletion events | ||
| - The adapter is responsible for determining the appropriate action based on cluster state | ||
|
|
||
| **Operational requirement for nodepool deletion**: Each Sentinel deployment is configured with a single `resource_type` (e.g., `clusters`). For nodepool-level deletion to work, a **separate Sentinel deployment** must be configured with `resource_type: nodepools` to poll and publish nodepool events. Without this, nodepool adapters will not receive deletion events. This is an operational rollout/deployment change, not a Sentinel code change. |
There was a problem hiding this comment.
I don't think this not is necessary.
Sentinel instances are required for clusters and nodepools always, the deletion use case doesn't alter this IMO
There was a problem hiding this comment.
Right, this is something misleading.
| Two additions to the adapter task configuration: | ||
|
|
||
| 1. **`when_deleting`** (top-level) - A CEL expression evaluated after preconditions. When `true`, the resources phase switches from creation mode to deletion mode. | ||
| 2. **`deleteOptions`** (per resource) - Defines the Kubernetes delete options (propagation policy and ordering) per resource, aligned with `metav1.DeleteOptions` conventions. |
There was a problem hiding this comment.
why are these deleteOptions camelCase?
There was a problem hiding this comment.
Oh, it's k8s Style, let me change to follow our config standard
There was a problem hiding this comment.
Good catch. Renamed deleteOptions → delete_options (snake_case) throughout the doc to be consistent with the rest of the DSL naming convention.
|
|
||
| No changes are required to Sentinel **code logic or CloudEvent format**. The Sentinel: | ||
| - Polls the API and evaluates `message_decision` CEL expressions as usual | ||
| - Publishes the **same CloudEvent format** regardless of cluster phase |
There was a problem hiding this comment.
IMO we should avoid talking about cluster phase.
We have cluster.status or cluster.status.conditions
There was a problem hiding this comment.
right, let me remove phase related words to avoid confusions.
There was a problem hiding this comment.
Fixed — removed all cluster phase and status.phase references. Using cluster.status.conditions and deleted_at instead.
| - **Cycle detected** (e.g., A after B, B after A): Abort deletion, report `Health=False` with reason `InvalidDeletionOrder` and message describing the cycle | ||
| - **Unknown resource reference** (e.g., `after: [nonExistent]`): Abort deletion, report `Health=False` with reason `InvalidDeletionConfig` and message identifying the unknown reference | ||
|
|
||
| These are config errors caught at execution time. The adapter reports `Applied=True` (resources still exist) and `Health=False` (config error) so the API does not proceed with deletion. |
There was a problem hiding this comment.
Can this be detected at boot time?
There was a problem hiding this comment.
It can't. For example, there are two k8s manifest configured in the configuration and manifestA can be applied succefully bug manifestB can't due to cannot pass k8s validations. And maybe service offering team configured the Applied Cel expression to just chech manifestA
|
|
||
| ### Adapter Framework Changes | ||
|
|
||
| #### DSL Changes |
There was a problem hiding this comment.
Is this example only applicable for kubernetes transport or is it also for maestro?
My doubt is if deleteOptions in maestro transport already specified in the manifestwork?
There was a problem hiding this comment.
Right, it's only for k8s transport.
|
|
||
| Two additions to the adapter task configuration: | ||
|
|
||
| 1. **`when_deleting`** (top-level) - A CEL expression evaluated after preconditions. When `true`, the resources phase switches from creation mode to deletion mode. |
There was a problem hiding this comment.
I was under the impression that deletion would be managed by a different adapter than creation.
Having when_deleting keyword means that we are using the same adapter for all the CRUD actions for resources.
Is this the intention?
There was a problem hiding this comment.
There is another alternative for having it in a different adapter. It will provide some other complex, we can have a discussion later.
| end | ||
| ``` | ||
|
|
||
| #### Adapter Status Decision Matrix |
There was a problem hiding this comment.
My fear is that this can increase the complexity to understand the system greatly
Until now, we only take into account Available
There was a problem hiding this comment.
Maybe there is a way to reduce the complexity and using only Available for deletion.
Let me elaborate:
- The user deletes the cluster in the API
generationadvances,deleted_atis setReady==false
All adapters will start to receive messages because of Ready==false
If an adapter reads deleted_at set
- It will answer always
Available=False - Until the resource is deleted
This way, the computation of the aggregated condition is kept the same.
A deletion, is just another reconciliation by adapters
All required adapters must answer with Available=True, or they will keep receiving messages
There was a problem hiding this comment.
Available=True means resources are ready for using not means resources exist or not. Applied=True means resources exist, and for deletion we are checking if resources are cleaned.
| body: '{{ .clusterStatusPayload }}' | ||
| ``` | ||
|
|
||
| **Note on CEL expression complexity**: The post-processing expressions use `when_deleting` ternary branching to handle both creation and deletion states in a single config. While this keeps lifecycle logic co-located, it increases expression complexity. To manage this: |
There was a problem hiding this comment.
An idea to simplify this would be to define two different payloads
One for creation, one for deletion
And then conditionally use one or the other in the post_actions
One concern is that you may want to skip completely the building of one payload when is not the correct time. e.g. when creating, you don't want to create the deletion payload
A way to avoid this could be to add a when condition to the payloads.
There was a problem hiding this comment.
I agree with this, adding a when will enable us to write straightforward expressions for each mode instead of nesting them and they becoming very verbose and complex
There was a problem hiding this comment.
That is also an approach provided by AI. But a different payloads seems another approach of different adapter. Maybe we can talk same adapter VS different adapter first
| | **Framework changes** | Deletion mode switch in executor + deletion strategies | Still needs delete action in executor + resource ordering | | ||
| | **Scaling** | Single adapter handles both | Can scale deletion independently | | ||
| | **Operational overhead** | Low - one deployment | High - doubled deployments, monitoring, ConfigMaps | | ||
| | **Complexity** | Logic complexity in config (CEL expressions for status) | Infrastructure complexity (more pods, subscriptions) | |
There was a problem hiding this comment.
There is the logic to compute the aggregated Ready condition
With separated adapters, the deletion is an additional condition that the API computes.
With a single adapter, the complexity now is greater since it implies taking into account Available, Applied, Health
There was a problem hiding this comment.
They are calculating for different purpose.
ciaranRoche
left a comment
There was a problem hiding this comment.
Just to clarify, is there any cases of race conditions if a 'create' or 'update' adapter is running and a 'delete' adapter reports conditions before them?
I am assuming with the incremented generation the 'create', 'update' conditions would be dropped as stale, but think it is worth calling out in this doc to cover in flight status reports.
|
|
||
| Two additions to the adapter task configuration: | ||
|
|
||
| 1. **`when_deleting`** (top-level) - A CEL expression evaluated after preconditions. When `true`, the resources phase switches from creation mode to deletion mode. |
There was a problem hiding this comment.
Does it make sense to have this keyword at the resource level?
Can different resources have different triggers for deletion?
There was a problem hiding this comment.
I don't think so. Since each resource have their own sentinel shard and each adapter should serve only one resource kind in API side, they must be in same lifecycle. per resource level makes it complicated to manage.
| 5. If `when_deleting` is false or not set → resources phase runs in creation mode | ||
| 6. Post-processing (always runs) | ||
|
|
||
| #### deleteOptions |
There was a problem hiding this comment.
I wonder if for resource updates we will offer an option to re-create the resource, by deleting first and later creating it again.
In that case... will deleteOptions still apply? e.g. the order of deleting/creating
There was a problem hiding this comment.
I am considering to removing the supporting of recreate. Maestro doesn't support it, if service offering team want a cron job they should use job TTL. recreate may cause resource leak problems. I would hold for that until get any requirements. I didn't see delete_options has any conflicts here.
There was a problem hiding this comment.
unofficially I was given requirements for recreate they want full lifecycle control of adapter task resources.
For example if they are creating a job, they want to be able specify when to create another one, how many jobs to retain, when you clean up, how to name them etc etc
There was a problem hiding this comment.
I am yet to put them into a JIRA just yet
| resources: | ||
| - name: clusterJob | ||
| manifest: ... | ||
| deleteOptions: |
There was a problem hiding this comment.
idea: name this also when_deleting
Then, it is obvious the relationship between the root level when_deleting and these options
There was a problem hiding this comment.
This could be also handy in the post.build phase.
Imagine having:
post:
payloads:
- name: clusterStatusPayload
build:
....
when_deleting:
....
Yes, maybe there is an ugly assymetry with build/when_deleting but... the result CEL expressions will be much clearer
alternativelly:
post:
payloads:
- name: clusterStatusPayload
build:
....
when_deleting:
- name: clusterStatusPayload # this overrides previous with same name
build:
....
There was a problem hiding this comment.
A different payloads looks like a different adapter behavior, we can discuss more later.
| ### What We Lose / What Gets Harder | ||
| - Resources phase executor becomes more complex (creation mode + deletion mode) | ||
| - Post-processing CEL expressions become more complex (must handle both creation and deletion states) | ||
| - `deleteOptions.after` ordering requires building a dependency graph in the executor |
There was a problem hiding this comment.
idea: what if we add conditions for each resource deletion?
Instead of having root level when_deleting we make it per resource.
resources:
- name: clusterNamespace
manifest:
...
discovery:
by_name: '{{ .clusterId | lower }}'
when_deleting:
expression: deleted_at != "" && !resources.clusterConfigMap && !resources.clusterJob
propagationPolicy: Background
- name: clusterConfigMap
manifest:
...
discovery:
by_name: cluster-{{ .clusterId }}-config
namespace: '{{ .clusterId | lower }}'
when_deleting:
expression: deleted_at != "" && !resources.clusterConfigMap
propagationPolicy: Background
For this to work there should be a discovery at the beginning of the resources phase to populate initially the resources.xx variables
There was a problem hiding this comment.
That make sense.
| 1. **Mark for Deletion**: On `DELETE /clusters/{id}`, the API sets `deletion_timestamp` on the cluster record **and all its nodepools**. All records remain in the database and are still queryable. | ||
| - `deletion_timestamp` is the canonical delete intent signal for orchestration and adapter behavior. | ||
| - Customer-facing lifecycle state (for example, `Terminating`) is derived from `deletion_timestamp` at read time. | ||
| - API should set aggregate `Ready=False` (reason `Deleting`) immediately when `deletion_timestamp` is set so Sentinel's existing not-ready query picks up the resource without Sentinel code changes. |
There was a problem hiding this comment.
To that end I think we need a new condition for deletion, I think anything else will be a pure hack
Well, if you think of a deletion just being a desired state... the whole point of the api-sentinel-adapter is to reconcile with that desired state.
For me, a deleted API resource with Ready==True means that the "real world" has been reconciled with the desired state.
But Ready is really a misleading name here.... Reconciled could be an option
|
|
||
| The API implements a two-phase deletion pattern: | ||
|
|
||
| 1. **Mark for Deletion**: On `DELETE /clusters/{id}`, the API sets `deletion_timestamp` on the cluster record **and all its nodepools**. All records remain in the database and are still queryable. |
There was a problem hiding this comment.
If we offer a way to delete API resources maybe we don't need to hide soft_deleted API resources.
In fact, Sentinel needs to fetch API resources that are soft_deleted but Ready==False until they transition to True
(And then, Sentinel should not get soft_deleted API resources that are Ready==True older than 30m )
| - API should set aggregate `Ready=False` (reason `Deleting`) immediately when `deletion_timestamp` is set so Sentinel's existing not-ready query picks up the resource without Sentinel code changes. | ||
| 2. **Delete (Hierarchical)**: The API uses layered deletion: | ||
| - **Nodepool level**: Each nodepool record is deleted when all its nodepool-level adapters report `Applied=False` + `Health=True` with `observed_time >= deletion_timestamp` | ||
| - **Cluster level**: The cluster record is deleted when all cluster-level adapters report `Applied=False` + `Health=True` with `observed_time >= deletion_timestamp` **AND** all nodepool records have been deleted |
There was a problem hiding this comment.
We focus heavy on cluster/node pool relations here. I think we should be way more generic and define how we handle resources and their sub resources, as sub resources could have sub resources of their own.
I think we need to discuss more around the deletion contract between parent and child resources and the possibility of future grand children. Right now this document is too focused on cluster/node pools and means as our API grows we would need similar docs for new resources
There was a problem hiding this comment.
This doc currently focus on what we implemented in our API. But I can make it generic to avoid further changes.
|
|
||
| The API waits for adapters with **existing status entries** for the cluster. An adapter that never reported status has no resources to clean up. | ||
|
|
||
| **Important**: "No status entries" does not mean "delete immediately." The API should wait for at least one Sentinel reconciliation cycle after `deletion_timestamp` is set before concluding that no adapters will report. This accounts for the case where an adapter ACKs a broker message but fails to POST its status — without a grace period, the API could delete the record before the adapter retries its status report. A configurable grace period (e.g., 30 seconds after `deletion_timestamp`) ensures adapters have time to report before the API proceeds. |
There was a problem hiding this comment.
How does this API wait happen? Is it a synchronous check on every poll, or some background routine?
There was a problem hiding this comment.
This is something even won't happen. Removing
|
|
||
| ```yaml | ||
| # Evaluated after preconditions, using captured variables | ||
| when_deleting: "deletionTimestamp != ''" |
There was a problem hiding this comment.
Shouldn't we change to deletion_mode:
when_deleting reads like an event hook ("when deleting happens, do X"), but it's actually a boolean predicate that switches the executor's operating mode.
deletion_mode fits better because that's exactly what the doc describes — the design already
uses the term "deletion mode" repeatedly. It also pairs naturally with the CEL expression:
deletion_mode: "deletionTimestamp != ''".
There was a problem hiding this comment.
Good idea. I asked AI to help to generate a terminology which fit k8s ecosystem conventions and it gave me when_deleting.
|
|
||
| ### Status Reporting During Deletion | ||
|
|
||
| The adapter uses the existing status contract with no new **required** condition types. The three standard conditions (Applied, Available, Health) remain the canonical contract for orchestration and naturally reflect deletion state: |
There was a problem hiding this comment.
I think it worth to clarify why the deletion gate uses Applied=False AND Health=True AND observed_time >= deletion_timestamp but Available — which is explicitly set to False during deletion — is not part of the
gate. The doc doesn't explain why.
Maybe add a note:
| The adapter uses the existing status contract with no new **required** condition types. The three standard conditions (Applied, Available, Health) remain the canonical contract for orchestration and naturally reflect deletion state: | |
| The adapter uses the existing status contract with no new **required** condition types. The three standard conditions (Applied, Available, Health) remain the canonical contract for orchestration and naturally reflect deletion state: | |
| `Available` is excluded from the deletion gate because `Available=False` is ambiguous — it occurs both during initial provisioning (resources not yet ready) and during deletion (resources being terminated). It doesn't add discriminating signal beyond what `Applied=False` already provides. The `observed_time >= deletion_timestamp` check is what disambiguates "never created" from "successfully deleted." |
There was a problem hiding this comment.
That's a good point. But actually, Available=False only means the manifest is not ready for using doesn't mean it's gone. Applied=False means the resource is gone. For deletion, the intention is to check if resource is gone not check if resource is ready for using. I will explain it.
The create/update should be forbidden from API side once the cluster has deleted_at or |
I think @ciaranRoche means having some adapter response delayed from an update before the deletion
This is how the aggregation logic works today |
Consolidate the deletion-flow documentation into a single commit with Kubernetes-aligned terminology and clearer API/Sentinel/adapter lifecycle guidance, including the latest wording cleanup. Made-with: Cursor
85f615e to
ade0197
Compare
Summary
invoke_deletion(top-level CEL expression) anddeleteOption(per resource withpropagationPolicyandafter)metav1.DeletionPropagation:Background,Foreground,OrphanApplied=FalseANDHealth=TrueANDobserved_time >= deletion_timestampTest plan
Relates to: HYPERFLEET-560
Summary by CodeRabbit