K3s cluster managed using Kustomize for GitOps deployment of self-hosted applications
Welcome to my homelab! This repository contains the complete Infrastructure-as-Code (IaC) for my Kubernetes homelab running on K3s. The cluster hosts various self-hosted applications for media streaming, productivity, monitoring, and AI workloads, all managed through GitOps principles using Kustomize overlays.
- K3s Kubernetes cluster with NVIDIA GPU support for AI inference
- GitOps deployment using Kustomize and FluxCD for automated continuous delivery
- Automated dependency management with Renovate creating pull requests for updates
- Secret management with SOPS and age encryption
- External access via Cloudflare Tunnels and Tailscale
- Monitoring with Prometheus and Grafana
- Multi-storage support including local, NFS, and Longhorn distributed storage
- SSL/TLS termination with Traefik ingress controller
The homelab uses a GitOps approach with FluxCD and Kustomize for automated deployment and configuration management. FluxCD continuously monitors the Git repository and automatically applies changes to the cluster, ensuring the desired state is always maintained.
- FluxCD - Automated GitOps continuous delivery and reconciliation that watches the repository for changes and automatically deploys updates
- Renovate - Automated dependency updates via pull requests for container images, Helm charts, etc.
- Kustomize overlays - Environment-specific configurations with base/staging structure
- SOPS encryption - Secure secret management with age keys integrated into GitOps workflows
- Automated reconciliation - Ensures cluster state matches Git repository at all times
This Git repository contains the following top level directories:
๐ apps/ # Applications deployed into the cluster
โโ๐ base/ # Base application configurations
โโ๐ staging/ # Environment-specific overlays
๐ infrastructure/ # Infrastructure components and controllers
โโ๐ controllers/ # Cluster infrastructure (monitoring, storage, etc.)
โโ๐ configs/ # Configuration overlays and secrets
| Logo | Name | Description |
|---|---|---|
| K3s | Lightweight Kubernetes distribution | |
| Kustomize | Kubernetes native configuration management | |
| SOPS | Secrets management with age encryption | |
| Traefik | Modern HTTP reverse proxy and load balancer | |
| Longhorn | Cloud native distributed block storage | |
| Prometheus | Systems monitoring and alerting toolkit | |
| Grafana | Operational dashboards and visualization | |
| cert-manager | Cloud native certificate management | |
| MetalLB | Bare metal load balancer for HA services | |
| Helm | The package manager for Kubernetes | |
| Proxmox VE | 3-node HA cluster with Ceph storage | |
| Ceph | Distributed storage across Proxmox cluster | |
| TrueNAS Scale | N100 NAS with ZFS storage and application hosting | |
| UniFi Network | Enterprise networking with UDM Ultra, 16-port switch, and APs |
| Icon | Application | Category | Description | Status |
|---|---|---|---|---|
| ๐ฌ | Jellyfin | Media Server | Self-hosted media streaming with GPU transcoding | โ Deployed |
| ๐ | Audiobookshelf | Audio Books | Self-hosted audiobook and podcast server | โ Deployed |
| Icon | Application | Category | Description | Status |
|---|---|---|---|---|
| ๐ | Linkding | Bookmark Manager | Minimal bookmark management | โ Deployed |
| ๐ค | Ollama + Open WebUI | AI/LLM | Local large language model deployment | โ Deployed |
| Icon | Application | Category | Description | Status |
|---|---|---|---|---|
| ๐ฎ | Steam Headless | Cloud Gaming | Steam with Sunshine streaming server | ๐ฆ Archived |
| Icon | Application | Category | Description | Status |
|---|---|---|---|---|
| ๐ | Grafana | Dashboard | Operational dashboards and monitoring | โ Deployed |
| ๐ | Vault | Secrets Management | HashiCorp Vault for secret management | ๐ง Testing |
| ๐พ | Longhorn | Storage Management | Distributed storage management UI | โ Deployed |
| ๐ | Renovate | Automation | Automated dependency updates | โ Deployed |
- Proxmox Cluster: 3-node HA cluster with Ceph distributed storage
- Main Station PC: Primary node with NVIDIA RTX 3090 for GPU workloads
- XPS 15: Laptop node with 5Gb WizDPI networking
- Razer 15: Laptop node with 5Gb WizDPI networking
- TrueNAS Scale: N100-based NAS with 4x5Gb networking
- Services: Jellyfin (main instance), PostgreSQL, Redis
- Planned: S3 object storage for backups and application data
- Docker Host: N100 mini PC running various containerized services (always-on)
- Primary K3s Node:
mainkube- VM on main PC with GPU passthrough for testing/development and AI - Network Infrastructure:
- UniFi Ultra: Core router/firewall/controller
- UniFi Enterprise 16-Port PoE: Managed switching with PoE+
- UniFi Access Points: WiFi coverage
- 5Gb Backbone: WizDPI networking for high-speed inter-cluster communication
- Storage: Ceph cluster + TrueNAS ZFS + SMB/NFS shares
- Worker Nodes: Raspberry Pi cluster running Talos OS
- Control Plane Nodes: K3s VMs across all Proxmox cluster nodes for HA
- Main PC VM: Primary control plane with GPU passthrough
- XPS 15 VM: Secondary control plane node
- Razer 15 VM: node
- Load Balancing: MetalLB for service distribution across K3s nodes
- High Availability: Multi-master K3s cluster architecture
- Storage Integration: Longhorn + TrueNAS S3 backend
- NVIDIA drivers installed on host
- NVIDIA Container Toolkit configured
- Compatible GPU with driver version available at https://download.nvidia.com/XFree86/Linux-x86_64/
# Configure K3s with NVIDIA runtime
sudo nvidia-ctk runtime configure \
--runtime=containerd \
--config=/var/lib/rancher/k3s/agent/etc/containerd/config.toml
sudo systemctl restart k3sJellyfin: Hardware transcoding for 4K media- Ollama: Accelerated LLM inference
Steam Headless: GPU-accelerated game streaming
All sensitive data is encrypted using SOPS with age encryption:
- SMB/CIFS credentials for media storage
- Cloudflare tunnel certificates
- Application secrets and API keys
- Cloudflare Tunnels provide secure external access to select services
- Tailscale provides secure VPN access to the entire homelab network for personal use
- Traefik handles internal routing and SSL termination
- Network isolation via Kubernetes namespaces
- local-path: Fast local storage for testing purposes and stateful apps
- longhorn: Replicated distributed storage for critical data
- nfs: Network storage for large media files
- smb: Windows SMB/CIFS shares for existing media libraries
- s3: Object storage (not implemented yet) - planned for backups, archive storage, and application data
- Proxmox VM snapshots for complete node backup and disaster recovery
- Longhorn snapshots for critical application data with automated scheduling
- Git repository contains all configuration as code with encrypted secrets
- Automated backup pipeline to S3 storage (planned)
# Verify GPU is available in cluster
kubectl describe nodes | grep nvidia.com/gpu
# Check NVIDIA runtime configuration
sudo nvidia-ctk runtime configure --runtime=containerd --config=/var/lib/rancher/k3s/agent/etc/containerd/config.toml- Add Raspberry Pi workers - Deploy Talos OS on RPi cluster for HA
- Add laptop control plane nodes - Configure XPS 15 and Razer 15 as K3s masters/workers for HA control plane
- MetalLB implementation - Load balancer for service distribution across K3s nodes
- Deploy Vault - Centralized secret management across cluster
- Docker container migration - Move services from N100 mini PC to K3s cluster
- S3 storage backend - Implement object storage on TrueNAS Scale
- Tailscale operator - Native Kubernetes integration for VPN access
- Production environment - Create production overlay configurations