Skip to content

MCP Resource Limits Configuration Proposal #2073

@aiob3

Description

@aiob3

MCP Resource Limits Configuration Proposal

Problem Statement

Docker MCP Gateway / cagent currently has no built-in mechanism to limit:

  • Maximum concurrent MCP instances running in parallel
  • Memory per instance
  • Total memory consumption
  • CPU allocation per tool
  • Instance lifecycle (timeout, cleanup)

This leads to resource exhaustion on local development machines (especially WSL 2), causing:

  • 90%+ CPU usage from uncontrolled docker-mcp.exe spawning
  • Memory bloat
  • System lockups
  • Daemon crashes

Example of Current Issue

When adding multiple MCP tools (exa, fetch, filesystem, clickhouse, playwright, etc.), each tool call can spawn new instances without limits, resulting in 100+ orphaned processes consuming resources indefinitely.

Proposed Solution

Add a .mcp-limits.yaml configuration file that allows users to define resource constraints:

# .mcp-limits.yaml
mcp:
  global:
    max_concurrent_instances: 10          # Max total instances across all tools
    max_total_memory: 2048                # MB - total memory cap
    max_total_cpu: 80                     # Percentage (0-100)
    cleanup_orphans: true
    orphan_detection_interval: 30         # seconds
    instance_timeout: 600                 # seconds (10 minutes)

  tools:
    exa:
      max_instances: 2
      max_memory: 256                     # MB per instance
      max_cpu: 25                         # Percentage per instance
      timeout: 120                        # seconds

    fetch:
      max_instances: 3
      max_memory: 512
      max_cpu: 50
      timeout: 180

    playwright:
      max_instances: 1
      max_memory: 1024
      max_cpu: 50
      timeout: 300

    clickhouse:
      max_instances: 1
      max_memory: 512
      max_cpu: 40
      timeout: 600

  # Fallback for tools not explicitly configured
  default:
    max_instances: 2
    max_memory: 256
    max_cpu: 30
    timeout: 300

Implementation Details

1. Configuration Loading

# Locations checked in order:
~/.mcp-limits.yaml              # User home
.mcp-limits.yaml                # Project root
$CAGENT_CONFIG_DIR/limits.yaml  # Environment variable
/etc/cagent/mcp-limits.yaml     # System-wide (Linux/Mac)

2. Instance Manager Enhancement

class MCPInstanceManager:
    def __init__(self, config_path):
        self.config = load_config(config_path)
        self.active_instances = {}
        self.start_orphan_cleanup_task()
    
    def spawn_instance(self, tool_name):
        """Spawn with resource limits enforced"""
        # Check global limits
        if len(self.active_instances) >= self.config.global.max_concurrent_instances:
            raise MCPLimitExceeded("Max concurrent instances reached")
        
        tool_config = self.config.tools.get(tool_name, self.config.default)
        
        # Check tool-specific limits
        tool_instances = len([i for i in self.active_instances.values() 
                            if i.tool == tool_name])
        if tool_instances >= tool_config.max_instances:
            raise MCPLimitExceeded(f"Max instances for {tool_name} reached")
        
        # Spawn with cgroup/ulimit constraints
        instance = self._spawn_docker_container(
            tool_name,
            memory_limit=f"{tool_config.max_memory}M",
            cpuset=tool_config.max_cpu
        )
        
        self.active_instances[instance.id] = instance
        return instance
    
    def cleanup_orphans(self):
        """Periodically remove dead/timed-out instances"""
        now = time.time()
        to_remove = []
        
        for instance_id, instance in self.active_instances.items():
            elapsed = now - instance.created_at
            if elapsed > instance.config.timeout:
                instance.terminate()
                to_remove.append(instance_id)
        
        for instance_id in to_remove:
            del self.active_instances[instance_id]

3. Monitoring & Alerts

class MCPResourceMonitor:
    def check_health(self):
        """Emit warnings when approaching limits"""
        total_mem = sum(i.memory_usage for i in self.instances.values())
        total_cpu = sum(i.cpu_usage for i in self.instances.values())
        
        if total_mem > self.config.global.max_total_memory * 0.8:
            logger.warning(f"Memory usage at {total_mem}MB ({total_mem / self.config.global.max_total_memory * 100}%)")
        
        if total_cpu > self.config.global.max_total_cpu * 0.8:
            logger.warning(f"CPU usage at {total_cpu}%")

4. CLI Integration

# Show current limits
cagent mcp limits show

# Update limits
cagent mcp limits set --max-instances 5 --max-memory 2048

# Monitor usage in real-time
cagent mcp monitor

# Force cleanup
cagent mcp cleanup --force

Benefits

  1. Prevents resource exhaustion: No more 90%+ CPU spikes
  2. Production-ready: Scales safely in constrained environments
  3. User-friendly: Zero-config defaults, easy to customize
  4. Transparent: Monitor actual usage vs limits
  5. Safe: Graceful degradation instead of crashes

Testing Scenarios

# Test 1: Spawn 20 tools, should queue/fail gracefully
for i in {1..20}; do cagent call exa --query "test$i" & done

# Test 2: Monitor that cleanup removes orphans
cagent mcp monitor  # Should show instances terminating after timeout

# Test 3: Verify memory cap respected
cagent mcp limits set --max-memory 512
# Large operation should fail or queue

Backward Compatibility

  • All limits default to unlimited if .mcp-limits.yaml not found
  • Existing configs work unchanged
  • Environment variable overrides available for CI/CD

Related Issues

  • Similar pattern used by: Kubernetes (resource requests/limits), Docker (--memory, --cpus), Systemd (MemoryMax, CPUQuota)

Files to Modify

  1. cagent/config/limits.py - New config parser
  2. cagent/mcp/instance_manager.py - Resource enforcement
  3. cagent/mcp/monitor.py - Health checks
  4. cagent/cli/commands/mcp.py - New CLI commands
  5. docs/mcp-configuration.md - Documentation
  6. examples/.mcp-limits.yaml - Example config

Implementation Priority

  • Phase 1: Global instance/memory limits (MVP)
  • Phase 2: Per-tool limits + cleanup
  • Phase 3: CLI monitoring + auto-tuning
  • Phase 4: Integration with Docker Desktop API for WSL 2

Author: aiob3 & Gordon (Security/Performance concern)
Date: 03/12/2026
Severity: Medium (Impacts usability, not security)
Type: Enhancement/Feature Request

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/toolsFor features/issues/fixes related to the usage of built-in and MCP toolskind/enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions