Note
The code included in this repository is not meant to be run as-is. It's a collection of infrastructure code and Kubernetes manifests used to deploy the goingdark.social Kubernetes cluster. You will need to adapt the code to your own needs and environment.
This repository contains the complete infrastructure-as-code setup for deploying our production Kubernetes cluster on Hetzner Cloud. We run a Mastodon community focused on homelabs, self-hosting, and privacy advocacy.
The project uses:
- OpenTofu - Infrastructure provisioning and management
- Talos Linux - Kubernetes-optimized operating system
- ArgoCD - GitOps continuous deployment
- Cilium - eBPF-based container networking
- VictoriaMetrics - Monitoring and observability stack
- Gateway API - Modern ingress management
The infrastructure follows GitOps principles with ArgoCD managing application deployments from the kubernetes/ directory.
opentofu/- Hetzner Cloud infrastructure codekubernetes/apps/argocd/- GitOps deployment controllerkubernetes/apps/base-system/- Core cluster services (networking, monitoring, certificates)kubernetes/apps/platform/- Community applicationsmastodon/- Our Mastodon instance (glitch-soc)base/- Shared manifests and generators used by every environmentoverlays/prod/- Production overlay, identical to the previous single-environment layoutoverlays/dev/- Lightweight dev slice with trimmed resources and dev-only hostnames
cryptpad/- Privacy-respecting collaborative editor (base plusoverlays/prodfor ArgoCD)hypebot/- Community engagement automation (base plusoverlays/prod)elastic/- Elastic operator managed via Helm (base/+overlays/prod) and deployed into the sharedelastic-systemnamespace
kubernetes/apps/database/- Database operators and tooling
Install OpenTofu first, then provision the Hetzner Cloud infrastructure:
cd opentofu
tofu init -upgrade
tofu plan # Review planned changes
tofu apply # Deploy infrastructureThis creates the Kubernetes cluster, networking, storage, and security groups as defined in the OpenTofu configuration files. As part of the apply step Talos also syncs our Cilium L2 announcement policy, load balancer IP pool, and a ConfigMap with the chart values so the cluster comes up with our Cilium settings before ArgoCD takes over.
After infrastructure deployment, bootstrap the cluster with applications:
# Set up cluster access
export TALOSCONFIG=./opentofu/talosconfig
export KUBECONFIG=./opentofu/kubeconfig
# Deploy all applications via ArgoCD
kubectl apply -f kubernetes/application-set.yamlThis bootstrap process installs:
- ArgoCD for GitOps deployments
- Core networking (Cilium with encryption)
- Certificate management (cert-manager)
- Monitoring stack (VictoriaMetrics, Grafana)
- External secrets management
- Our community applications (Mastodon, CryptPad, Hypebot)
Once deployed, the cluster hosts:
- Mastodon - Our community social platform with 1000 character posts
- CryptPad - Collaborative document editing without surveillance
- Hypebot - Automated community engagement and post boosting
- Grafana - Infrastructure monitoring and alerting
- PostgreSQL - Primary database for Mastodon
- Redis - Caching layer for improved performance
All applications are managed through ArgoCD and deploy automatically when changes are pushed to the kubernetes/ directory.
The Mastodon app now follows a standard Kustomize structure: a reusable base/ and dedicated overlays for each namespace. ArgoCD only syncs the overlays, so the production overlay stays unchanged while the new overlays/dev/ overlay runs a smaller copy with reduced autoscaling limits and separate hostnames inside the same cluster.
Dev secrets are isolated from production: every ExternalSecret in the dev overlay reads *-dev keys from Bitwarden and the CNPG SecretStore points at the mastodon-dev namespace, so you need to provision those credentials before enabling sync. The platform ApplicationSet now only auto-syncs non-prod overlays—platform-mastodon-prod stays manual while other environments land in <app>-<env> namespaces.
- The
mastodon-webservice now exposes port 9394 so each pod's/metricsendpoint is reachable inside the cluster. - VictoriaMetrics scrapes that endpoint through a
VMServiceScrapeand a Prometheus adapter publishes custom metrics for queue latency, backlog, and request rate. - The web autoscaler scales when p95 queue time stays over 35 ms or backlog rises above three requests, and it keeps an 80 % memory target as a safety net.
- Scale ups can add two pods every 30 seconds, while scale downs wait three minutes before stepping back to avoid flapping.
- Sidekiq default and federation workers scale on the
sidekiq_queue_latency_secondsmetric (10 seconds for default, 30 seconds for federation) so they grow only when the queues back up. - Streaming workers follow the
mastodon_streaming_connected_clientsmetric and add capacity once a pod carries around 200 live connections.
- The autoscaler node pool is now limited to stateless deployments that the descheduler can freely evict.
- Stateful components like PostgreSQL, Redis, and Elasticsearch, along with single-replica Sidekiq and streaming workers, are pinned to the fixed worker pool so scale-down drains stay possible.
- The descheduler policy now treats nodes below roughly 40 % utilization as underused and balances pods away from the autoscaler nodes so Cluster Autoscaler can remove idle machines.
- Talos applies
vm.max_map_count=262144to every worker through OpenTofu so Elasticsearch and other mmap-heavy services come up cleanly on fresh nodes with no manual tuning.
kube-state-metricsnow ships the Kuadrant CustomResourceState bundle so VictoriaMetrics receives thegatewayapi_*series for GatewayClasses, Gateways, HTTPRoutes, TCPRoutes, TLSRoutes, GRPCRoutes, and UDPRoutes.- The Grafana sidecar auto-imports the Gateway API dashboards that live in
kubernetes/apps/base-system/victoriametrics/dashboards/; each ConfigMap is labeledgrafana_dashboard=1so the new boards show up without manual imports. - All dashboards point at the VictoriaMetrics datasource, so the existing scrape jobs and retention settings still apply—no extra Prometheus configuration is required.
- The external gateway exposes an HTTP listener on port 80 so Tor traffic reaches the cluster without onion TLS termination.
- A dedicated HTTPRoute publishes the
.onionhostname, sends/api/v1/streamingrequests to the streaming service, routes everything else to the web pods, and adds anOnion-Locationresponse header for Tor Browser. - A Tor hidden-service deployment forwards onion requests to the gateway load balancer and stores the generated hostname on a persistent volume so it survives pod restarts.
- The
mastodon-app-secretsExternalSecret carries the onion hostname from Bitwarden so the value stays out of the repo and can be rotated alongside the Tor key material. - A Tor HTTP proxy runs inside the cluster on port 8118 and Mastodon points both
http_proxyandhttp_hidden_proxyat it for federation with onion-only peers. - Mastodon sets
ALLOW_ACCESS_TO_HIDDEN_SERVICE=trueso it accepts the onion host while keeping HTTPS for the public domain.
Infrastructure: Hetzner Cloud, Talos Linux, OpenTofu Orchestration: Kubernetes, ArgoCD, Cilium Monitoring: VictoriaMetrics, VictoriaLogs, Grafana Security: External Secrets, cert-manager, Gateway API