Skip to content

PulpApiWorker RSS grows unbounded due to glibc heap fragmentation #7482

@amasolov

Description

@amasolov

API workers (PulpApiWorker / gunicorn SyncWorker) exhibit continuous RSS growth (~1 kB/request) even under minimal load (health probes only). The growth is caused by glibc heap fragmentation — Django's request cycle allocates and frees many small C-level objects (ORM query compilers, SQL strings, psycopg cursor state), and glibc's malloc retains the freed pages in the process heap rather than returning them to the OS.

Evidence

Profiling on a live Ansible Automation Platform 2.6 deployment (pulpcore 3.49.49, Django 4.2.27, Python 3.12, OpenShift):

  • Python object counts are stablegc.get_objects() delta is ~0 after initial lazy initialization
  • gc.collect() recovers 0 bytes — no reference cycles
  • malloc_trim(0) reclaims ~2 MB immediately — confirms heap fragmentation
  • RSS grows linearly at ~1 kB/request without trimming, with no upper bound
  • Master process RSS is flat — only forked workers are affected

Impact

Over hours, worker RSS climbs from ~150 MB to multiple GB, leading to:

  • Gunicorn worker timeout (SIGKILL)
  • Health probe failures
  • Pod OOM kills and restarts

This is observed even with zero user activity — Kubernetes liveness/readiness probes alone drive the growth.

Proposed fix

PR #7481 adds periodic gc.collect() + malloc_trim(0) in PulpApiWorker.handle_request(), configurable via PULP_MEMORY_TRIM_INTERVAL env var (default: every 1024 requests, set to 0 to disable). Linux-only, graceful no-op on other platforms.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions