DataIntegrationGroup · kbighorse · Jan 15, 2026 · Jan 15, 2026 · Jan 15, 2026 · Jan 15, 2026
diff --git a/.dockerignore b/.dockerignore
diff --git a/Procfile b/Procfile
diff --git a/RENDER_DEPLOYMENT.md b/RENDER_DEPLOYMENT.md
@@ -0,0 +1,248 @@
+# Render Deployment Guide
+
+This guide explains how to deploy OcotilloAPI to Render using the `render.yaml` blueprint.
+
+## Overview
+
+The `render.yaml` file defines two complete environments:
+- **Staging**: Auto-deploys from `staging` branch
+- **Production**: Manual deploys from `main` branch (requires approval)
+
+## Architecture
+
+### Services Created
+
+1. **Web Services** (2)
+   - `ocotillo-api-staging` - Staging environment
+   - `ocotillo-api-production` - Production environment
+
+2. **Databases** (2)
+   - `ocotillo-db-staging` - PostgreSQL 17 with PostGIS
+   - `ocotillo-db-production` - PostgreSQL 17 with PostGIS
+
+3. **Environment Variable Groups** (3)
+   - `ocotillo-shared` - Common configuration for all environments
+   - `ocotillo-staging` - Staging-specific settings
+   - `ocotillo-production` - Production-specific settings
+
+## Initial Deployment
+
+### Step 1: Connect Repository
+
+1. Log in to [Render Dashboard](https://dashboard.render.com)
+2. Click **"New Blueprint Instance"**
+3. Connect your GitHub repository
+4. Select the branch containing `render.yaml` (e.g., `render-deploy`)
+
+### Step 2: Configure Environment Variables
+
+Render will automatically create the environment variable groups, but you need to set values for `sync: false` variables:
+
+#### Staging Environment (`ocotillo-staging` group)
+
+Set these in the Render dashboard:
+
+```bash
+# Authentik OAuth Configuration
+AUTHENTIK_URL=https://your-staging-authentik-instance.com
+AUTHENTIK_CLIENT_ID=your_staging_client_id
+AUTHENTIK_AUTHORIZE_URL=https://your-staging-authentik-instance.com/application/o/authorize/
+AUTHENTIK_TOKEN_URL=https://your-staging-authentik-instance.com/application/o/token/
+
+# Google Cloud Storage (if using assets)
+GCS_BUCKET_NAME=your-staging-bucket
+GOOGLE_APPLICATION_CREDENTIALS=/path/to/staging/credentials.json
+```
+
+#### Production Environment (`ocotillo-production` group)
+
+```bash
+# Authentik OAuth Configuration
+AUTHENTIK_URL=https://your-production-authentik-instance.com
+AUTHENTIK_CLIENT_ID=your_production_client_id
+AUTHENTIK_AUTHORIZE_URL=https://your-production-authentik-instance.com/application/o/authorize/
+AUTHENTIK_TOKEN_URL=https://your-production-authentik-instance.com/application/o/token/
+
+# Google Cloud Storage (if using assets)
+GCS_BUCKET_NAME=your-production-bucket
+GOOGLE_APPLICATION_CREDENTIALS=/path/to/production/credentials.json
+```
+
+**Note**: `SESSION_SECRET_KEY` is auto-generated by Render (`generateValue: true`)
+
+### Step 3: Review and Deploy
+
+1. Review the services that will be created
+2. Click **"Apply"** to create all resources
+3. Render will:
+   - Create PostgreSQL databases with PostGIS extension
+   - Build Docker images
+   - Run database migrations
+   - Deploy the applications
+
+## Database Configuration
+
+### PostGIS Extension
+
+The `preDeployCommand` automatically installs PostGIS:
+```bash
+psql $DATABASE_URL -c "CREATE EXTENSION IF NOT EXISTS postgis;"
+```
+
+### Connection Details
+
+- **DATABASE_URL** is automatically provided by Render
+- Connections are internal-only by default (`ipAllowList: []`)
+- To allow external connections, add IP addresses to `ipAllowList` in `render.yaml`
+
+## Health Checks
+
+Both environments use `/health/ready` endpoint:
+- Checks application responsiveness
+- Verifies database connectivity
+- Returns 200 if healthy, 503 if not ready
+
+## Deployment Workflow
+
+### Staging Deployments
+
+```bash
+git checkout staging
+git merge render-deploy  # or your feature branch
+git push origin staging
+```
+
+**Result**: Automatic deployment to staging environment
+
+### Production Deployments
+
+```bash
+git checkout main
+git merge staging
+git push origin main
+```
+
+**Result**: Build triggered, but requires manual approval in Render dashboard
+
+## Environment-Specific Configuration
+
+### Staging
+- `MODE=development`
+- Auto-deploy enabled
+- Authentication enabled (Authentik)
+- Smaller instance size (starter plan)
+
+### Production
+- `MODE=production`
+- Manual deploy (requires approval)
+- Authentication enabled (Authentik)
+- Larger instance size (standard plan)
+
+## Service Plans
+
+Current configuration in `render.yaml`:
+
+| Service | Plan | Notes |
+|---------|------|-------|
+| Staging Web | Starter | Suitable for testing |
+| Production Web | Standard | Recommended for production traffic |
+| Staging DB | Standard | Can downgrade to Starter if needed |
+| Production DB | Standard | Recommended for production data |
+
+To change plans, edit the `plan` field in `render.yaml`.
+
+## Scaling Configuration
+
+The application uses **Gunicorn with 4 workers** by default (configured in `entrypoint.sh`).
+
+To adjust workers:
+1. Edit `entrypoint.sh`
+2. Change `--workers 4` to desired number
+3. Commit and push changes
+
+**Recommended**: 2-4 workers per container, scale horizontally by adding more instances.
+
+## Monitoring
+
+### Health Check Endpoints
+
+- `/health/live` - Basic liveness check
+- `/health/ready` - Readiness check with DB connectivity
+- `/health/status` - Detailed status with pool metrics
+
+### Logs
+
+Access logs in Render dashboard:
+1. Navigate to your service
+2. Click **"Logs"** tab
+3. Filter by severity or search terms
+
+## Troubleshooting
+
+### Database Connection Issues
+
+Check if PostGIS extension is installed:
+```bash
+# Via Render Shell
+psql $DATABASE_URL -c "SELECT PostGIS_Version();"
+```
+
+### Migration Failures
+
+Run migrations manually via Render Shell:
+```bash
+alembic upgrade head
+```
+
+### Port Binding Issues
+
+Ensure `$PORT` environment variable is being used (already configured in `entrypoint.sh`).
+
+## Updating the Blueprint
+
+After modifying `render.yaml`:
+
+1. Commit and push changes
+2. Go to Render dashboard
+3. Navigate to Blueprint instance
+4. Click **"Sync"** to apply changes
+
+**Warning**: Some changes may cause service restarts.
+
+## Cost Optimization
+
+### Staging Environment
+
+Consider downgrading if budget-constrained:
+- Web service: `starter` → `free` (with limitations)
+- Database: `standard` → `starter` (1GB storage, no point-in-time recovery)
+
+### Production Environment
+
+Recommended minimum:
+- Web service: `standard` (for multiple workers)
+- Database: `standard` (for backups and PITR)
+
+## Security Considerations
+
+1. **Authentication**: Authentik OAuth is enabled by default
+2. **Database**: Internal-only access (no public IP by default)
+3. **Secrets**: All sensitive values use `sync: false` (not in git)
+4. **HTTPS**: Automatically provided by Render
+
+## Next Steps
+
+After deployment:
+
+1. ✅ Verify health checks are passing
+2. ✅ Test API endpoints
+3. ✅ Configure custom domains (optional)
+4. ✅ Set up monitoring/alerting
+5. ✅ Configure backups schedule
+6. ✅ Review and optimize database performance
+
+## Support
+
+- [Render Documentation](https://render.com/docs)
+- [Render Status Page](https://status.render.com)
+- [Community Forum](https://community.render.com)
diff --git a/alembic/env.py b/alembic/env.py
@@ -56,7 +56,17 @@ def build_database_url():
             return f"postgresql+pg8000://{user}@/{database}"
         return f"postgresql+pg8000://{user}:{password}@/{database}"
 
-    # Default/Postgres
+    # Check for DATABASE_URL first (Render/Heroku standard)
+    database_url = os.environ.get("DATABASE_URL", "")
+    if database_url:
+        # Convert to psycopg2 driver for alembic
+        if database_url.startswith("postgres://"):
+            return database_url.replace("postgres://", "postgresql+psycopg2://", 1)
+        elif database_url.startswith("postgresql://"):
+            return database_url.replace("postgresql://", "postgresql+psycopg2://", 1)
+        return database_url
+
+    # Fall back to individual env vars (backward compatible)
     user = os.environ.get("POSTGRES_USER", "")
     password = os.environ.get("POSTGRES_PASSWORD", "")
     db = os.environ.get("POSTGRES_DB", "")

diff --git a/api/health.py b/api/health.py
@@ -0,0 +1,121 @@
+"""Health check endpoints for deployment monitoring and container orchestration."""
+
+from datetime import datetime
+from typing import Any
+
+from fastapi import APIRouter, HTTPException, status
+from sqlalchemy import text
+
+from core.dependencies import session_dependency
+
+router = APIRouter(prefix="/health", tags=["Health"])
+
+
+@router.get("/live", summary="Liveness probe")
+async def liveness_check() -> dict[str, str]:
+    """
+    Basic liveness probe - checks if the application is running.
+
+    This endpoint should always return 200 if the app is alive.
+    No dependencies checked - just confirms the web server is responding.
+
+    Use this for Kubernetes/Docker liveness probes.
+    """
+    return {"status": "ok"}
+
+
+@router.get("/ready", summary="Readiness probe")
+async def readiness_check(session: session_dependency) -> dict[str, Any]:
+    """
+    Readiness probe - checks if the application is ready to serve traffic.
+
+    Verifies:
+    - Application is running
+    - Database connection is available
+
+    Returns 200 if ready, 503 if not ready.
+    Use this for Kubernetes/Docker readiness probes and load balancer health checks.
+    """
+    try:
+        # Test database connectivity with a simple query
+        result = await session.execute(text("SELECT 1"))
+        result.scalar()
+
+        return {
+            "status": "ready",
+            "database": "connected",
+            "checks": {"db_connection": True},
+        }
+    except Exception as e:
+        raise HTTPException(
+            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
+            detail={
+                "status": "not_ready",
+                "database": "disconnected",
+                "error": str(e),
+                "checks": {"db_connection": False},
+            },
+        )
+
+
+@router.get("/status", summary="Detailed status information")
+async def status_check(session: session_dependency) -> dict[str, Any]:
+    """
+    Detailed status endpoint - provides comprehensive health information.
+
+    Returns:
+    - Application status
+    - Database connectivity and pool information
+    - Timestamp
+    - Additional service status
+
+    Use this for monitoring dashboards and detailed health reports.
+    """
+    timestamp = datetime.utcnow().isoformat()
+
+    # Check database connectivity
+    db_status = "disconnected"
+    db_connected = False
+    pool_info = {}
+
+    try:
+        result = await session.execute(text("SELECT 1"))
+        result.scalar()
+        db_status = "connected"
+        db_connected = True
+
+        # Get connection pool information
+        pool = session.get_bind().pool
+        pool_info = {
+            "size": pool.size(),
+            "overflow": pool.overflow() if hasattr(pool, "overflow") else None,
+            "checked_in": pool.checkedin() if hasattr(pool, "checkedin") else None,
+            "checked_out": pool.checkedout() if hasattr(pool, "checkedout") else None,
+        }
+        # Remove None values
+        pool_info = {k: v for k, v in pool_info.items() if v is not None}
+
+    except Exception as e:
+        db_status = f"error: {str(e)}"
+
+    response = {
+        "status": "healthy" if db_connected else "degraded",
+        "timestamp": timestamp,
+        "database": {
+            "status": db_status,
+            "connected": db_connected,
+        },
+        "services": {"admin": "available", "cors": "enabled"},
+    }
+
+    # Add pool info if available
+    if pool_info:
+        response["database"]["pool_info"] = pool_info
+
+    # Return 503 if database is not connected
+    if not db_connected:
+        raise HTTPException(
+            status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=response
+        )
+
+    return response
@@ -2,12 +2,15 @@
 from datetime import datetime
 from typing import Any
+import logging
 from fastapi import APIRouter, HTTPException, status
 from sqlalchemy import text
 from core.dependencies import session_dependency
+logger = logging.getLogger(__name__)
+
 router = APIRouter(prefix="/health", tags=["Health"])
@@ -47,12 +44,13 @@
            "checks": {"db_connection": True},
        }
    except Exception as e:
+        logger.exception("Readiness database connectivity check failed")
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail={
                "status": "not_ready",
                "database": "disconnected",
-                "error": str(e),
+                "error": "database connectivity check failed",
                "checks": {"db_connection": False},
            },
        )
@@ -96,7 +89,8 @@
        pool_info = {k: v for k, v in pool_info.items() if v is not None}
    except Exception as e:
-        db_status = f"error: {str(e)}"
+        logger.exception("Status database connectivity check failed")
+        db_status = "error: database connectivity check failed"
    response = {
        "status": "healthy" if db_connected else "degraded",
@@ -2,12 +2,15 @@

 from datetime import datetime
 from typing import Any
+import logging

 from fastapi import APIRouter, HTTPException, status
 from sqlalchemy import text

 from core.dependencies import session_dependency

+logger = logging.getLogger(__name__)
+
 router = APIRouter(prefix="/health", tags=["Health"])


@@ -47,12 +44,13 @@
            "checks": {"db_connection": True},
        }
    except Exception as e:
+        logger.exception("Readiness database connectivity check failed")
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail={
                "status": "not_ready",
                "database": "disconnected",
-                "error": str(e),
+                "error": "database connectivity check failed",
                "checks": {"db_connection": False},
            },
        )
@@ -96,7 +89,8 @@
        pool_info = {k: v for k, v in pool_info.items() if v is not None}

    except Exception as e:
-        db_status = f"error: {str(e)}"
+        logger.exception("Status database connectivity check failed")
+        db_status = "error: database connectivity check failed"

    response = {
        "status": "healthy" if db_connected else "degraded",