Skip to content

fix: prevent WorkManager APPEND chain poisoning in background worker#404

Open
imnexerio wants to merge 2 commits intoABausG:mainfrom
imnexerio:fix/workmanager-append-chain-poisoning
Open

fix: prevent WorkManager APPEND chain poisoning in background worker#404
imnexerio wants to merge 2 commits intoABausG:mainfrom
imnexerio:fix/workmanager-append-chain-poisoning

Conversation

@imnexerio
Copy link

Problem

HomeWidgetBackgroundWorker uses ExistingWorkPolicy.APPEND with enqueueUniqueWork(). If initializeFlutterEngine() throws (e.g., stale callback handle, lookup failure), doWork() returns Result.failure().

With APPEND, a failed work item cancels all subsequently appended work items. This failure state is persisted in WorkManager's SQLite database and survives process death, force stop, and device reboots — permanently breaking all background callbacks until the user manually clears app data.

Root Cause

The combination of ExistingWorkPolicy.APPEND + Result.failure() creates an unrecoverable state:

1=Engine init fails → Result.failure() returned
2=WorkManager marks the chain as failed
3=All future enqueueUniqueWork(APPEND) calls append to the failed chain → immediately cancelled
4=No background callbacks execute until WorkManager's DB is wiped (clear app data)

Fix
1=APPEND → APPEND_OR_REPLACE — If the chain is failed/cancelled, new work replaces it instead of being appended to a dead chain
2=Result.failure() → Result.success() on engine init failure — prevents chain poisoning in the first place
Added resetEngine() — properly destroys a partially initialized FlutterEngine and resets static state so the next work item retries from clean state
3=Added null check for engine after init block — graceful handling if engine is null

- Change ExistingWorkPolicy from APPEND to APPEND_OR_REPLACE so a failed
  or cancelled chain is replaced instead of permanently blocking future work.

- Return Result.success() with failure output data instead of Result.failure()
  when engine initialization fails, preventing chain poisoning while preserving
  observability through output data (engine_init_failed flag + error message).

- Add coroutine Mutex to synchronize engine initialization, preventing
  concurrent workers from racing on the init check.

- Destroy FlutterEngine on the main thread via mainHandler.post() as
  required by the FlutterEngine API.

- Preserve pending callback queue across engine resets so earlier workers'
  callbacks can still be dispatched after a successful re-initialization.

Fixes: WorkManager chain state is persisted in its SQLite database and
survives process death, force stop, and device reboots. Once any doWork()
returned Result.failure(), all subsequently appended items were immediately
cancelled, permanently breaking background callbacks until the user manually
cleared app data.
@docs-page
Copy link

docs-page bot commented Feb 27, 2026

To view this pull requests documentation preview, visit the following URL:

docs.page/abausg/home_widget~404

Documentation is deployed and generated using docs.page.

Copilot AI review requested due to automatic review settings February 27, 2026 10:10
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Walkthrough

Added thread-safe synchronization around Flutter engine initialization using a mutex guard. Enhanced error handling distinguishes between cancellation and general exceptions, with a new resetEngine() helper method. Modified work queueing strategy and implemented defensive callback management to preserve and dispatch pending callbacks during engine resets.

Changes

Cohort / File(s) Summary
Engine Initialization & Error Handling
packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt
Introduced engineMutex synchronization guard for thread-safe engine initialization. Extended error handling to distinguish cancellation exceptions (rethrown) from general exceptions (logged and queued as retryable callbacks). Added resetEngine() helper to destroy engine on main thread while preserving queued callbacks. Modified enqueueWork to use APPEND_OR_REPLACE strategy. Added defensive null checks after initialization with callback queueing fallback. Callbacks are now preserved across engine resets and dispatched when engine becomes available.

Sequence Diagram

sequenceDiagram
    participant Worker as HomeWidget<br/>BackgroundWorker
    participant Mutex as engineMutex
    participant Engine as Flutter<br/>Engine
    participant MainThread as Main<br/>Thread
    participant Queue as Callback<br/>Queue
    
    Worker->>Mutex: Acquire lock for initialization
    Mutex-->>Worker: Lock acquired
    Worker->>Engine: Initialize engine
    alt Initialization Success
        Engine-->>Worker: Engine ready
        Worker->>Queue: Dispatch pending callbacks
        Queue-->>Engine: Execute callbacks
    else CancellationException
        Engine-->>Worker: Cancelled
        MainThread->>Engine: resetEngine()
        Engine->>Queue: Preserve callbacks
        Worker->>Worker: Rethrow exception
    else General Exception
        Engine-->>Worker: Error
        MainThread->>Engine: resetEngine()
        Engine->>Queue: Preserve callbacks
        Worker->>Queue: Queue callback with error data
        Worker-->>Worker: Return success (non-fatal)
    end
    
    alt Engine Null After Init
        Worker->>Queue: Queue callback data
        Worker-->>Worker: Return success with error
    end
    
    Mutex->>Worker: Release lock
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description comprehensively explains the problem, root cause, and fix approach. However, the PR description section is not followed by the required checklist items (tests, documentation, examples, breaking change declaration) from the template. Complete the checklist section with selections for test updates, documentation/comments, examples, and breaking change status as specified in the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main issue being fixed: preventing WorkManager chain poisoning caused by APPEND policy with failure results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a WorkManager “work chain poisoning” failure mode in HomeWidgetBackgroundWorker by preventing a single engine initialization failure from permanently cancelling all future background callbacks.

Changes:

  • Switches the unique-work enqueue policy from ExistingWorkPolicy.APPEND to APPEND_OR_REPLACE to allow recovery from failed/cancelled chains.
  • Prevents engine init failures from returning Result.failure() (and instead returns Result.success() while queuing the callback payload).
  • Introduces engineMutex and a resetEngine() helper, plus a defensive null check around engine after initialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +161 to +163
val engineToDestroy = engine
engine = null
serviceStarted.set(false)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetEngine() updates shared state (serviceStarted, and indirectly affects how the shared queue is processed) without taking the same monitor lock (synchronized(serviceStarted)) that guards reads/writes elsewhere. This can lead to inconsistent interleavings (e.g., a worker observing serviceStarted=true while another thread resets it) and makes the synchronization scheme harder to reason about. Consider synchronizing on the same lock inside resetEngine() when mutating serviceStarted (and any other shared state coordinated with it).

Suggested change
val engineToDestroy = engine
engine = null
serviceStarted.set(false)
val engineToDestroy: FlutterEngine?
synchronized(serviceStarted) {
engineToDestroy = engine
engine = null
serviceStarted.set(false)
}

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +55
// Queue this work item's data so it is not silently lost. When a
// subsequent worker succeeds in starting the engine and Dart calls
// backgroundInitialized, the queued callback will be dispatched.
val data = inputData.getString(DATA_KEY) ?: ""
val args = listOf(HomeWidgetPlugin.getHandle(context), data)
synchronized(serviceStarted) {
queue.add(args)
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On engine initialization failure, this work item’s args are added to the static in-memory queue and the worker returns Result.success(). If initialization keeps failing permanently (e.g., missing/stale callback handle), every new work will keep appending to queue and nothing will ever drain it, which can grow memory unbounded over time. Consider adding a bounded queue (drop oldest/newest), deduplicating payloads, or persisting a limited backlog to disk instead of accumulating indefinitely in RAM.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt (1)

80-87: Return structured output in the null-engine fallback for observability parity.

Line 86 returns plain success, so this abnormal state is harder to detect downstream compared with the exception path.

📊 Suggested adjustment
-      return Result.success()
+      return Result.success(
+          Data.Builder()
+              .putBoolean("engine_null_after_init", true)
+              .build()
+      )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 80 - 87, In the HomeWidgetBackgroundWorker null-engine fallback
(inside the block that logs "Flutter engine is null after initialization,
queuing callback data"), return a structured WorkManager output instead of plain
Result.success(): build a Data payload (using the same DATA_KEY/value and
HomeWidgetPlugin.getHandle(context)) that includes a status or error key (e.g.,
"status":"engine_null") and the queued data, and then return
Result.success(theData) so downstream observability can detect this abnormal
path; update the return in that block and reference variables/methods:
inputData, DATA_KEY, HomeWidgetPlugin.getHandle(context), queue, and
serviceStarted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`:
- Around line 51-55: The pending callback queue in HomeWidgetBackgroundWorker
(the shared "queue" accessed inside synchronized(serviceStarted)) is unbounded
and can grow indefinitely; fix this by introducing a fixed maximum size (e.g.,
MAX_QUEUE_SIZE) and enforcing it wherever items are added (the enqueue sites
around DATA_KEY handling and the failure/null-engine paths referenced in the
diff and lines ~54, ~84, ~164), dropping oldest entries or rejecting new ones
deterministically when the limit is reached; also ensure the
reset/initialization code (the code that currently "keeps that queue across
resets") clears or reinitializes the queue atomically to prevent carrying over
stale items.
- Around line 184-193: Update the WorkManager dependency so
ExistingWorkPolicy.APPEND_OR_REPLACE used with WorkManager.getInstance(...) is
available: replace the dynamic dependency "androidx.work:work-runtime-ktx:2.+"
with a fixed version >= 2.4.0 (for example 2.8.1) in the module's Gradle
dependencies, then sync/rebuild so the call to
ExistingWorkPolicy.APPEND_OR_REPLACE when enqueuing UNIQUE_WORK_NAME works at
compile time.

---

Nitpick comments:
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`:
- Around line 80-87: In the HomeWidgetBackgroundWorker null-engine fallback
(inside the block that logs "Flutter engine is null after initialization,
queuing callback data"), return a structured WorkManager output instead of plain
Result.success(): build a Data payload (using the same DATA_KEY/value and
HomeWidgetPlugin.getHandle(context)) that includes a status or error key (e.g.,
"status":"engine_null") and the queued data, and then return
Result.success(theData) so downstream observability can detect this abnormal
path; update the return in that block and reference variables/methods:
inputData, DATA_KEY, HomeWidgetPlugin.getHandle(context), queue, and
serviceStarted.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa7453a and a301e4f.

📒 Files selected for processing (1)
  • packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt

Comment on lines +51 to +55
val data = inputData.getString(DATA_KEY) ?: ""
val args = listOf(HomeWidgetPlugin.getHandle(context), data)
synchronized(serviceStarted) {
queue.add(args)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bound the pending callback queue to avoid unbounded memory growth.

Line 54 and Line 84 enqueue on failure/null-engine paths, and Line 164 keeps that queue across resets. If initialization remains broken, this can grow indefinitely and cause memory pressure.

♻️ Proposed fix
@@
-          synchronized(serviceStarted) {
-            queue.add(args)
-          }
+          enqueuePending(args)
@@
-      synchronized(serviceStarted) {
-        queue.add(args)
-      }
+      enqueuePending(args)
@@
-        queue.add(args)
+        enqueuePending(args)
@@
     private const val UNIQUE_WORK_NAME = "home_widget_background"
     private const val CHANNEL_NAME = "home_widget/background"
+    private const val MAX_PENDING_CALLBACKS = 256
@@
     private val queue = ArrayDeque<List<Any>>()
@@
+    private fun enqueuePending(args: List<Any>) {
+      synchronized(serviceStarted) {
+        if (queue.size >= MAX_PENDING_CALLBACKS) {
+          queue.removeFirst()
+          Log.w(TAG, "Pending callback queue full; dropping oldest item")
+        }
+        queue.addLast(args)
+      }
+    }

Also applies to: 83-85, 164-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 51 - 55, The pending callback queue in HomeWidgetBackgroundWorker
(the shared "queue" accessed inside synchronized(serviceStarted)) is unbounded
and can grow indefinitely; fix this by introducing a fixed maximum size (e.g.,
MAX_QUEUE_SIZE) and enforcing it wherever items are added (the enqueue sites
around DATA_KEY handling and the failure/null-engine paths referenced in the
diff and lines ~54, ~84, ~164), dropping oldest entries or rejecting new ones
deterministically when the limit is reached; also ensure the
reset/initialization code (the code that currently "keeps that queue across
resets") clears or reinitializes the queue atomically to prevent carrying over
stale items.

Comment on lines +184 to +193
// Use APPEND_OR_REPLACE instead of APPEND to prevent work chain
// poisoning. With APPEND, if any work item in the chain fails, all
// subsequently appended items are immediately cancelled by WorkManager.
// This failure state is persisted in WorkManager's SQLite database and
// survives process death, force stop, and device reboots — permanently
// breaking background callbacks until the user clears app data.
// APPEND_OR_REPLACE replaces the chain if it has failed or been
// cancelled, allowing recovery without user intervention.
WorkManager.getInstance(context)
.enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND, workRequest)
.enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND_OR_REPLACE, workRequest)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Finding build.gradle and build.gradle.kts files =="
find . -type f \( -name "build.gradle" -o -name "build.gradle.kts" -o -name "libs.versions.toml" \) | head -20

echo
echo "== Searching for WorkManager dependency declarations =="
rg -n --no-heading 'androidx\.work|work[-._]runtime|work["\s]' --type gradle --type toml 2>/dev/null || rg -n --no-heading 'androidx\.work|work[-._]runtime' 2>/dev/null || true

echo
echo "== Checking for APPEND_OR_REPLACE usage =="
rg -n --no-heading 'APPEND_OR_REPLACE|ExistingWorkPolicy' 2>/dev/null || true

Repository: ABausG/home_widget

Length of output: 1283


🌐 Web query:

androidx.work ExistingWorkPolicy APPEND_OR_REPLACE release version history

💡 Result:

ExistingWorkPolicy.APPEND_OR_REPLACE was added in AndroidX WorkManager 2.4.0 (released July 22, 2020). The WorkManager 2.4.0 release notes list it under API changes (“Add ExistingWorkPolicy.APPEND_OR_REPLACE…”). [1]

The enum value is part of androidx.work.ExistingWorkPolicy in androidx.work:work-runtime. [2]

Relevant 2.4.0 train (dates): 2.4.0-alpha01 (Mar 4, 2020), 2.4.0-alpha02 (Apr 1, 2020), 2.4.0-alpha03 (Apr 29, 2020), 2.4.0-beta01 (May 20, 2020), 2.4.0-rc01 (Jun 24, 2020), 2.4.0 (Jul 22, 2020). [1]

Sources:
[1] WorkManager release notes (Android Developers) (developer.android.com)
[2] ExistingWorkPolicy API reference (Android Developers) (developer.android.com)

Citations:


Pin WorkManager version to 2.4.0 or higher to support APPEND_OR_REPLACE.

The main module uses androidx.work:work-runtime-ktx:2.+, a dynamic version that can resolve to 2.0–2.3.x, which do not include ExistingWorkPolicy.APPEND_OR_REPLACE (introduced in 2.4.0). Change the dependency to androidx.work:work-runtime-ktx:2.4.0 or higher (e.g., 2.8.1 like the example module) to guarantee compatibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 184 - 193, Update the WorkManager dependency so
ExistingWorkPolicy.APPEND_OR_REPLACE used with WorkManager.getInstance(...) is
available: replace the dynamic dependency "androidx.work:work-runtime-ktx:2.+"
with a fixed version >= 2.4.0 (for example 2.8.1) in the module's Gradle
dependencies, then sync/rebuild so the call to
ExistingWorkPolicy.APPEND_OR_REPLACE when enqueuing UNIQUE_WORK_NAME works at
compile time.

@ABausG
Copy link
Owner

ABausG commented Mar 2, 2026

@imnexerio can you please stop closing and reopening the same PR over and over again?

Also please share what exactly the issue is you are seeing. (Not the cause as described in the PR title but how this manifests for you/users)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants