fix: prevent WorkManager APPEND chain poisoning in background worker#404
fix: prevent WorkManager APPEND chain poisoning in background worker#404imnexerio wants to merge 2 commits intoABausG:mainfrom
Conversation
- Change ExistingWorkPolicy from APPEND to APPEND_OR_REPLACE so a failed or cancelled chain is replaced instead of permanently blocking future work. - Return Result.success() with failure output data instead of Result.failure() when engine initialization fails, preventing chain poisoning while preserving observability through output data (engine_init_failed flag + error message). - Add coroutine Mutex to synchronize engine initialization, preventing concurrent workers from racing on the init check. - Destroy FlutterEngine on the main thread via mainHandler.post() as required by the FlutterEngine API. - Preserve pending callback queue across engine resets so earlier workers' callbacks can still be dispatched after a successful re-initialization. Fixes: WorkManager chain state is persisted in its SQLite database and survives process death, force stop, and device reboots. Once any doWork() returned Result.failure(), all subsequently appended items were immediately cancelled, permanently breaking background callbacks until the user manually cleared app data.
|
To view this pull requests documentation preview, visit the following URL: docs.page/abausg/home_widget~404 Documentation is deployed and generated using docs.page. |
WalkthroughAdded thread-safe synchronization around Flutter engine initialization using a mutex guard. Enhanced error handling distinguishes between cancellation and general exceptions, with a new Changes
Sequence DiagramsequenceDiagram
participant Worker as HomeWidget<br/>BackgroundWorker
participant Mutex as engineMutex
participant Engine as Flutter<br/>Engine
participant MainThread as Main<br/>Thread
participant Queue as Callback<br/>Queue
Worker->>Mutex: Acquire lock for initialization
Mutex-->>Worker: Lock acquired
Worker->>Engine: Initialize engine
alt Initialization Success
Engine-->>Worker: Engine ready
Worker->>Queue: Dispatch pending callbacks
Queue-->>Engine: Execute callbacks
else CancellationException
Engine-->>Worker: Cancelled
MainThread->>Engine: resetEngine()
Engine->>Queue: Preserve callbacks
Worker->>Worker: Rethrow exception
else General Exception
Engine-->>Worker: Error
MainThread->>Engine: resetEngine()
Engine->>Queue: Preserve callbacks
Worker->>Queue: Queue callback with error data
Worker-->>Worker: Return success (non-fatal)
end
alt Engine Null After Init
Worker->>Queue: Queue callback data
Worker-->>Worker: Return success with error
end
Mutex->>Worker: Release lock
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses a WorkManager “work chain poisoning” failure mode in HomeWidgetBackgroundWorker by preventing a single engine initialization failure from permanently cancelling all future background callbacks.
Changes:
- Switches the unique-work enqueue policy from
ExistingWorkPolicy.APPENDtoAPPEND_OR_REPLACEto allow recovery from failed/cancelled chains. - Prevents engine init failures from returning
Result.failure()(and instead returnsResult.success()while queuing the callback payload). - Introduces
engineMutexand aresetEngine()helper, plus a defensive null check aroundengineafter initialization.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| val engineToDestroy = engine | ||
| engine = null | ||
| serviceStarted.set(false) |
There was a problem hiding this comment.
resetEngine() updates shared state (serviceStarted, and indirectly affects how the shared queue is processed) without taking the same monitor lock (synchronized(serviceStarted)) that guards reads/writes elsewhere. This can lead to inconsistent interleavings (e.g., a worker observing serviceStarted=true while another thread resets it) and makes the synchronization scheme harder to reason about. Consider synchronizing on the same lock inside resetEngine() when mutating serviceStarted (and any other shared state coordinated with it).
| val engineToDestroy = engine | |
| engine = null | |
| serviceStarted.set(false) | |
| val engineToDestroy: FlutterEngine? | |
| synchronized(serviceStarted) { | |
| engineToDestroy = engine | |
| engine = null | |
| serviceStarted.set(false) | |
| } |
| // Queue this work item's data so it is not silently lost. When a | ||
| // subsequent worker succeeds in starting the engine and Dart calls | ||
| // backgroundInitialized, the queued callback will be dispatched. | ||
| val data = inputData.getString(DATA_KEY) ?: "" | ||
| val args = listOf(HomeWidgetPlugin.getHandle(context), data) | ||
| synchronized(serviceStarted) { | ||
| queue.add(args) | ||
| } |
There was a problem hiding this comment.
On engine initialization failure, this work item’s args are added to the static in-memory queue and the worker returns Result.success(). If initialization keeps failing permanently (e.g., missing/stale callback handle), every new work will keep appending to queue and nothing will ever drain it, which can grow memory unbounded over time. Consider adding a bounded queue (drop oldest/newest), deduplicating payloads, or persisting a limited backlog to disk instead of accumulating indefinitely in RAM.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt (1)
80-87: Return structured output in the null-engine fallback for observability parity.Line 86 returns plain success, so this abnormal state is harder to detect downstream compared with the exception path.
📊 Suggested adjustment
- return Result.success() + return Result.success( + Data.Builder() + .putBoolean("engine_null_after_init", true) + .build() + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt` around lines 80 - 87, In the HomeWidgetBackgroundWorker null-engine fallback (inside the block that logs "Flutter engine is null after initialization, queuing callback data"), return a structured WorkManager output instead of plain Result.success(): build a Data payload (using the same DATA_KEY/value and HomeWidgetPlugin.getHandle(context)) that includes a status or error key (e.g., "status":"engine_null") and the queued data, and then return Result.success(theData) so downstream observability can detect this abnormal path; update the return in that block and reference variables/methods: inputData, DATA_KEY, HomeWidgetPlugin.getHandle(context), queue, and serviceStarted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`:
- Around line 51-55: The pending callback queue in HomeWidgetBackgroundWorker
(the shared "queue" accessed inside synchronized(serviceStarted)) is unbounded
and can grow indefinitely; fix this by introducing a fixed maximum size (e.g.,
MAX_QUEUE_SIZE) and enforcing it wherever items are added (the enqueue sites
around DATA_KEY handling and the failure/null-engine paths referenced in the
diff and lines ~54, ~84, ~164), dropping oldest entries or rejecting new ones
deterministically when the limit is reached; also ensure the
reset/initialization code (the code that currently "keeps that queue across
resets") clears or reinitializes the queue atomically to prevent carrying over
stale items.
- Around line 184-193: Update the WorkManager dependency so
ExistingWorkPolicy.APPEND_OR_REPLACE used with WorkManager.getInstance(...) is
available: replace the dynamic dependency "androidx.work:work-runtime-ktx:2.+"
with a fixed version >= 2.4.0 (for example 2.8.1) in the module's Gradle
dependencies, then sync/rebuild so the call to
ExistingWorkPolicy.APPEND_OR_REPLACE when enqueuing UNIQUE_WORK_NAME works at
compile time.
---
Nitpick comments:
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`:
- Around line 80-87: In the HomeWidgetBackgroundWorker null-engine fallback
(inside the block that logs "Flutter engine is null after initialization,
queuing callback data"), return a structured WorkManager output instead of plain
Result.success(): build a Data payload (using the same DATA_KEY/value and
HomeWidgetPlugin.getHandle(context)) that includes a status or error key (e.g.,
"status":"engine_null") and the queued data, and then return
Result.success(theData) so downstream observability can detect this abnormal
path; update the return in that block and reference variables/methods:
inputData, DATA_KEY, HomeWidgetPlugin.getHandle(context), queue, and
serviceStarted.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt
| val data = inputData.getString(DATA_KEY) ?: "" | ||
| val args = listOf(HomeWidgetPlugin.getHandle(context), data) | ||
| synchronized(serviceStarted) { | ||
| queue.add(args) | ||
| } |
There was a problem hiding this comment.
Bound the pending callback queue to avoid unbounded memory growth.
Line 54 and Line 84 enqueue on failure/null-engine paths, and Line 164 keeps that queue across resets. If initialization remains broken, this can grow indefinitely and cause memory pressure.
♻️ Proposed fix
@@
- synchronized(serviceStarted) {
- queue.add(args)
- }
+ enqueuePending(args)
@@
- synchronized(serviceStarted) {
- queue.add(args)
- }
+ enqueuePending(args)
@@
- queue.add(args)
+ enqueuePending(args)
@@
private const val UNIQUE_WORK_NAME = "home_widget_background"
private const val CHANNEL_NAME = "home_widget/background"
+ private const val MAX_PENDING_CALLBACKS = 256
@@
private val queue = ArrayDeque<List<Any>>()
@@
+ private fun enqueuePending(args: List<Any>) {
+ synchronized(serviceStarted) {
+ if (queue.size >= MAX_PENDING_CALLBACKS) {
+ queue.removeFirst()
+ Log.w(TAG, "Pending callback queue full; dropping oldest item")
+ }
+ queue.addLast(args)
+ }
+ }Also applies to: 83-85, 164-166
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 51 - 55, The pending callback queue in HomeWidgetBackgroundWorker
(the shared "queue" accessed inside synchronized(serviceStarted)) is unbounded
and can grow indefinitely; fix this by introducing a fixed maximum size (e.g.,
MAX_QUEUE_SIZE) and enforcing it wherever items are added (the enqueue sites
around DATA_KEY handling and the failure/null-engine paths referenced in the
diff and lines ~54, ~84, ~164), dropping oldest entries or rejecting new ones
deterministically when the limit is reached; also ensure the
reset/initialization code (the code that currently "keeps that queue across
resets") clears or reinitializes the queue atomically to prevent carrying over
stale items.
| // Use APPEND_OR_REPLACE instead of APPEND to prevent work chain | ||
| // poisoning. With APPEND, if any work item in the chain fails, all | ||
| // subsequently appended items are immediately cancelled by WorkManager. | ||
| // This failure state is persisted in WorkManager's SQLite database and | ||
| // survives process death, force stop, and device reboots — permanently | ||
| // breaking background callbacks until the user clears app data. | ||
| // APPEND_OR_REPLACE replaces the chain if it has failed or been | ||
| // cancelled, allowing recovery without user intervention. | ||
| WorkManager.getInstance(context) | ||
| .enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND, workRequest) | ||
| .enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND_OR_REPLACE, workRequest) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Finding build.gradle and build.gradle.kts files =="
find . -type f \( -name "build.gradle" -o -name "build.gradle.kts" -o -name "libs.versions.toml" \) | head -20
echo
echo "== Searching for WorkManager dependency declarations =="
rg -n --no-heading 'androidx\.work|work[-._]runtime|work["\s]' --type gradle --type toml 2>/dev/null || rg -n --no-heading 'androidx\.work|work[-._]runtime' 2>/dev/null || true
echo
echo "== Checking for APPEND_OR_REPLACE usage =="
rg -n --no-heading 'APPEND_OR_REPLACE|ExistingWorkPolicy' 2>/dev/null || trueRepository: ABausG/home_widget
Length of output: 1283
🌐 Web query:
androidx.work ExistingWorkPolicy APPEND_OR_REPLACE release version history
💡 Result:
ExistingWorkPolicy.APPEND_OR_REPLACE was added in AndroidX WorkManager 2.4.0 (released July 22, 2020). The WorkManager 2.4.0 release notes list it under API changes (“Add ExistingWorkPolicy.APPEND_OR_REPLACE…”). [1]
The enum value is part of androidx.work.ExistingWorkPolicy in androidx.work:work-runtime. [2]
Relevant 2.4.0 train (dates): 2.4.0-alpha01 (Mar 4, 2020), 2.4.0-alpha02 (Apr 1, 2020), 2.4.0-alpha03 (Apr 29, 2020), 2.4.0-beta01 (May 20, 2020), 2.4.0-rc01 (Jun 24, 2020), 2.4.0 (Jul 22, 2020). [1]
Sources:
[1] WorkManager release notes (Android Developers) (developer.android.com)
[2] ExistingWorkPolicy API reference (Android Developers) (developer.android.com)
Citations:
- 1: https://developer.android.com/jetpack/androidx/releases/work
- 2: https://developer.android.com/reference/androidx/work/ExistingWorkPolicy?utm_source=openai
Pin WorkManager version to 2.4.0 or higher to support APPEND_OR_REPLACE.
The main module uses androidx.work:work-runtime-ktx:2.+, a dynamic version that can resolve to 2.0–2.3.x, which do not include ExistingWorkPolicy.APPEND_OR_REPLACE (introduced in 2.4.0). Change the dependency to androidx.work:work-runtime-ktx:2.4.0 or higher (e.g., 2.8.1 like the example module) to guarantee compatibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 184 - 193, Update the WorkManager dependency so
ExistingWorkPolicy.APPEND_OR_REPLACE used with WorkManager.getInstance(...) is
available: replace the dynamic dependency "androidx.work:work-runtime-ktx:2.+"
with a fixed version >= 2.4.0 (for example 2.8.1) in the module's Gradle
dependencies, then sync/rebuild so the call to
ExistingWorkPolicy.APPEND_OR_REPLACE when enqueuing UNIQUE_WORK_NAME works at
compile time.
|
@imnexerio can you please stop closing and reopening the same PR over and over again? Also please share what exactly the issue is you are seeing. (Not the cause as described in the PR title but how this manifests for you/users) |
Problem
HomeWidgetBackgroundWorker uses ExistingWorkPolicy.APPEND with enqueueUniqueWork(). If initializeFlutterEngine() throws (e.g., stale callback handle, lookup failure), doWork() returns Result.failure().
With APPEND, a failed work item cancels all subsequently appended work items. This failure state is persisted in WorkManager's SQLite database and survives process death, force stop, and device reboots — permanently breaking all background callbacks until the user manually clears app data.
Root Cause
The combination of ExistingWorkPolicy.APPEND + Result.failure() creates an unrecoverable state:
1=Engine init fails → Result.failure() returned
2=WorkManager marks the chain as failed
3=All future enqueueUniqueWork(APPEND) calls append to the failed chain → immediately cancelled
4=No background callbacks execute until WorkManager's DB is wiped (clear app data)
Fix
1=APPEND → APPEND_OR_REPLACE — If the chain is failed/cancelled, new work replaces it instead of being appended to a dead chain
2=Result.failure() → Result.success() on engine init failure — prevents chain poisoning in the first place
Added resetEngine() — properly destroys a partially initialized FlutterEngine and resets static state so the next work item retries from clean state
3=Added null check for engine after init block — graceful handling if engine is null