Skip to content

Fix a race condition when eve-k app restart app lost IP address#5706

Open
naiming-zededa wants to merge 1 commit intolf-edge:masterfrom
naiming-zededa:naiming-cni-race-fix
Open

Fix a race condition when eve-k app restart app lost IP address#5706
naiming-zededa wants to merge 1 commit intolf-edge:masterfrom
naiming-zededa:naiming-cni-race-fix

Conversation

@naiming-zededa
Copy link
Copy Markdown
Contributor

Description

  • fix an issue when the app restarting by the controller, the pillar side deleted the old replicaSet and recreated a new one. Due the the delay in removal of the old pod by kubernetes, we have a race condition, the newly created VIF was later deleted by the older pod removal.

PR dependencies

How to test and validate this PR

Run the eve-k image on the EVE device, and deploy a VM app over there.
Make sure it is running. and we can ssh into the app using the local NI IP address.
Then modify the app inbound port-map, add or delete or modify. save it, the update the
new app manifest to the device for the App. Wait until the app restarted and make sure the IP address
is assigned to the post restarted VMI.

Changelog notes

Fix a race condition when eve-k app restart app lost IP address

PR Backports

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • [ x I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

- fix an issue when the app restart by the controller, the pillar side
  deleted the old replicaSet and recreated a new one. Due the the delay
  in removal of the old pod by kubernetes, we have a race condition, the
  newly created VIF was later deleted by the older pod removal.

Signed-off-by: naiming-zededa <naiming@zededa.com>
z.log.Warnf("handleDisconnectPodRequest: ignoring stale disconnect for pod %s "+
"(app %v is already connected via pod %s)",
args.Pod.Name, appStatus.UUIDandVersion.UUID, appStatus.AppPod.Name)
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense to return this message here as a new Error so the caller of this function knows about it? Otherwise there will be no way to check whether the function succeed or it faced the race...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rene we don't need to return real error, if we do this would have passed back to the kubernetes caller side, indicate we failed to accept such a delete call (we actually already overwritten it w/ the new connection earlier).

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 29.45%. Comparing base (2281599) to head (30debeb).
⚠️ Report is 369 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5706      +/-   ##
==========================================
+ Coverage   19.52%   29.45%   +9.92%     
==========================================
  Files          19       18       -1     
  Lines        3021     2417     -604     
==========================================
+ Hits          590      712     +122     
+ Misses       2310     1554     -756     
- Partials      121      151      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants