Skip to content

Adds 'redis' WORKER_TYPE#7296

Open
dkliban wants to merge 20 commits intopulp:mainfrom
dkliban:7210
Open

Adds 'redis' WORKER_TYPE#7296
dkliban wants to merge 20 commits intopulp:mainfrom
dkliban:7210

Conversation

@dkliban
Copy link
Member

@dkliban dkliban commented Feb 8, 2026

This adds WORKER_TYPE setting. The default value is 'pulpcore'. When 'redis' is selected, the tasking system uses Redis to lock resources. Redis workers produce less load on the PostgreSQL database.

closes: #7210

Generated By: Claude Code.

📜 Checklist

  • Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
  • A changelog entry or entries has been added for any significant changes
  • Follows the Pulp policy on AI Usage
  • (For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a lot I haven't deeply reviewed yet, but this was getting long and I had a big idea around dispatch that I want to discuss

Comment on lines +183 to +186
def execute_task(task):
"""Redis-aware task execution that releases Redis locks for immediate tasks."""
# This extra stack is needed to isolate the current_task ContextVar
contextvars.copy_context().run(_execute_task, task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this version and the base version there is nothing much different between the two besides that this one calls safe_release_task_locks. Could this be a wrapper of the original with a try/finally to release the redis locks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis_tasks._execute_task and tasks._execute_task share the same core logic:
set_running → log_task_start → run the task function → log result → set_completed/set_failed →
send notification. The Redis version cannot directly call the PulpcoreWorker's _execute_task
because it must release Redis locks between task execution and state transition — a step that
doesn't exist in the PulpcoreWorker implementation.

except Exception:
# Exception during execute_task()
# Atomically release all locks as safety net
safe_release_task_locks(task, lock_owner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, execute_task should already handle letting go of the locks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct. i'll update the comment to more accurately state that the except block is for the case where using_workdir() fails before the execute function gets a chance to run and release locks itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a redis_using_workdir(task, app_lock) that handles this in a finally block?

Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comment/logging statements to remove. Need more specificity on the try/except blocks. And finally there are gaps in the task logic that need to be addressed.

local resource_name = ARGV[2 + i]

-- Remove from set
local removed = redis.call("srem", key, lock_owner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call can fail if the item at key is no longer a set, i.e. is now a string for an exclusive lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only happen in a situation where a worker is considered dead and another worker cleans up it's locks and then the dead worker tries to clean up it's own locks.

@dkliban dkliban force-pushed the 7210 branch 2 times, most recently from c1653b1 to ea56670 Compare February 23, 2026 00:36
dkliban added 6 commits March 16, 2026 20:43
This adds WORKER_TYPE setting. The default value is 'pulpcore'. When 'redis' is selected,
the tasking system uses Redis to lock resources. Redis workers produce less load on the
PostgreSQL database.

closes: pulp#7210

Generated By: Claude Code.
Added redis connection checks to the worker so it shuts down if the connection is broken.
This removes the need to perform a refresh of a task from the db before executing a task.

This also removes some excessive logging from dispatch method.
… in Redis when

tasks are not found for a missing worker
dkliban and others added 13 commits March 16, 2026 20:43
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Redis Lua scripts execute atomically, making deadlocks structurally
impossible regardless of key ordering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The only callers (are_resources_available and async_are_resources_available)
already catch any exception and return False. The ["error"] sentinel was a
code smell — a string masquerading as a resource name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Narrowing the exception type prevents programming bugs (AttributeError,
TypeError, etc.) from being silently swallowed in Redis operation functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The try blocks only wrap acquire_locks(), a pure Redis operation.
All ORM calls are outside the try block so Exception was too broad.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a task fails to acquire locks because one of its resources is blocked,
all of its exclusive resources are now marked as blocked so that later tasks
cannot jump ahead in the queue by acquiring those resources first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the duplicate _execute_task/_aexecute_task implementations in
redis_tasks.py with thin wrappers that delegate to PulpcoreWorker's
implementations and release Redis locks in a finally block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a Redis based worker to the tasking system

2 participants