Skip to content

LiteLLM exceptions cannot be pickled by cloudpickle (breaks docket task serialization) #238

@bsbodden

Description

@bsbodden

Summary

When a docket background task (e.g., memory extraction, NER) fails with a LiteLLM exception, cloudpickle cannot serialize the exception for the task result. LiteLLM exception classes have __init__ signatures requiring positional arguments (message, model, llm_provider) that cloudpickle's default reconstruction cannot satisfy.

Root cause

LiteLLM exception classes (e.g., BadRequestError, RateLimitError, AuthenticationError, APIConnectionError) have constructors like:

class BadRequestError(openai.BadRequestError):
    def __init__(self, message, model, llm_provider, ...):
        ...

cloudpickle uses __reduce__ to serialize objects. The default implementation tries to call cls.__init__() during reconstruction, but without the required positional args, it raises TypeError: __init__() missing required positional arguments.

Reproduction

import cloudpickle
import pickle
from litellm.exceptions import BadRequestError

exc = BadRequestError(message="Bad request", model="test", llm_provider="test")
pickled = cloudpickle.dumps(exc)
restored = pickle.loads(pickled)  # ❌ TypeError

A full reproduction test is available at tests/test_upstream_issues.py::TestLiteLLMPickleSerialization.

Impact

  • Any docket task that encounters a LiteLLM error (rate limits, auth failures, bad requests, connection errors) will fail to serialize the error result
  • The task framework cannot report the error back to the caller
  • This affects extract_memory_structure, extract_memories_with_strategy, and other worker tasks that call LLM APIs

Fix available

Branch bsb/in-a-pickle-231 has a complete fix:

  • agent_memory_server/litellm_pickle_compat.py — patches __reduce__ on 15 LiteLLM exception classes using Exception.__new__() + __dict__ restore pattern
  • agent_memory_server/docket_tasks.py — single import line to activate the patch at worker startup
  • tests/test_docket_serialization.py — 4 test classes validating the pickle roundtrip

The fix monkey-patches __reduce__ to use a reconstruction function that bypasses __init__:

def _reconstruct_litellm_exception(cls, state):
    exc = Exception.__new__(cls)
    exc.__dict__.update(state)
    return exc

Environment

  • AMS version: main at fd73560
  • litellm >= 1.80.11
  • Discovered during Snowflake SPCS deployment with docket task workers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions