Skip to content

Fix intermittent hot-reload test failures caused by metadata initialization race condition#3302

Open
aaronburtle wants to merge 5 commits intomainfrom
dev/aaronburtle/hot-reload-test-internalservererror
Open

Fix intermittent hot-reload test failures caused by metadata initialization race condition#3302
aaronburtle wants to merge 5 commits intomainfrom
dev/aaronburtle/hot-reload-test-internalservererror

Conversation

@aaronburtle
Copy link
Contributor

Why make this change?

Related to #2992

Some flakey tests were being masked by faster initialization when the task they were in was isolated. When we refactored the tests to lower the overall runtime in this PR #3245 these flakey tests were eventually revealed.

The root cause was a race condition: after a successful hot-reload, WaitForConditionAsync detects the "Validated hot-reloaded configuration file" console message and returns, but the engine's metadata providers have not fully re-initialized yet. An immediate HTTP request can arrive before the metadata is ready, causing a 500 error.

What is this change?

Added retry logic to the three non-ignored tests that make HTTP calls expecting success responses immediately after hot-reload:

  • HotReloadConfigRuntimePathsEndToEndTest: REST and GraphQL calls now retry up to 5 times with 1 second delays
  • HotReloadConfigConnectionString: REST call now uses WaitForRestEndpointAsync helper
  • HotReloadConfigDatabaseType: REST call now uses WaitForRestEndpointAsync helper

Added a shared WaitForRestEndpointAsync helper method that polls a REST endpoint until it returns the expected status code (5 retries, 1 second delay).

This follows the same retry pattern already established in HotReloadConfigDataSource, which had this exact fix applied previously.

How was this tested?

Ran a batch of 10 pipeline runs which all succeeded.

image

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves reliability of MSSQL hot-reload integration tests by adding post–hot-reload readiness polling/retry logic, addressing a race where config validation completes before metadata providers are fully re-initialized.

Changes:

  • Added REST polling helper (WaitForRestEndpointAsync) and updated tests to wait for REST readiness after hot-reload.
  • Added retry loop for the GraphQL request in HotReloadConfigRuntimePathsEndToEndTest to tolerate transient post–hot-reload failures.
  • Updated connection-string and database-type hot-reload tests to use the shared REST polling helper.

@aaronburtle
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

Copy link
Contributor

@souvikghosh04 souvikghosh04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix. approved with suggestions.

@aaronburtle
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

await Task.Delay(delayMilliseconds);
}

// Return the last response (undisposed) so the caller can inspect/assert on it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller should dispose the last response then explicitly, or use the using clause.

Copy link
Collaborator

@Aniruddh25 Aniruddh25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should dispose the last response.

@github-project-automation github-project-automation bot moved this from In Progress to Review In Progress in Data API builder Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Review In Progress

Development

Successfully merging this pull request may close these issues.

4 participants