Logtrain is a system for dynamically forwarding and transforming logs, similar to fluentd but a bit more specialized to solve two issues...
- Have very low overhead (e.g., less than 64Mi)
- Dynamically route logs based on various data sources.
es+https://user:password@host?[auth=apikey|bearer|basic]&[index=...]&[insecure=true]es+http://user:password@host?[auth=apikey|bearer|basic]&[index=...]&[insecure=true]
The bearer token is taken from the password portion of the url. Api keys the API id should be used as the username and the API key should be the password. Setting insecure=true ignores certificate failures
http://host/pathhttps://host/path?[insecure=true]
Setting insecure=true ignores certificate failures
syslog+tls://host:port?[ca=]syslog+http://host:portsyslog+https://host:portsyslog+tcp://host:portsyslog+udp://(aliases,syslog://)
persistent://key
Persistent storage will fail if persistent storage is not configured (see below). The key may be any value up to 128 characters. The persistent key may only be used on finite resources such as Pods, and cannot be set on deployments, statefulsets, etc.
TODO
Below is the recommended way of deploying Log Train. Note that this must run as a privileged container,
so that it can read files from /var/log/containers. The daemonset also contains an initContainer that
will change the sysctl fs.inotify.max_user_instances to 2048 on your nodes (usually from 128).
kubectl apply -f ./deployments/kubernetes/logtrain-serviceaccount.yaml
kubectl apply -f ./deployments/kubernetes/logtrain-service.yaml
kubectl apply -f ./deployments/kubernetes/logtrain-daemonset.yamlOnce deployed you can use the following annotations on deployments, daemonsets or statefulsets to forward logs.
logtrain.akkeris.io/drainsThis annoation is a comma delimited list of drains (See Drain Types above).
logtrain.akkeris.io/hostnameExplicitly set the hostname used when reading in logs from kubernetes, if not set this will default to the name.namespace.
logtrain.akkeris.io/tagExplicitly set the tag when reading in logs from kuberntes, if not set this will default to the pod name.
TODO
HTTP_PORT- The port to use for the http server, shared by any http (payload) and http (syslog) inputs.
Whether to watch a postgres database for information on where to foward logs to.
POSTGRES- set totrueDATABASE_URL- The database url to use to listen for drain changes.
Whether to watch kubernetes deployments, statefulsets and daemonsets for annotations indicating where logs should be forwarded to.
KUBERNETES_DATASOURCE- set totrue
Persistent log storage can be done via a postgres database. Set PERSISTENT_DATABASE_URL to specify the database to store logs in.
Logs stored can be retrieved directly through the database in the table logs.data with the key being logs.id column. In addition,
logs persisted can be retrieved via the /logs/:key.
PERSISTENT- set totruePERSISTENT_DATABASE_URL- A postgres database to store logs in the format of postgres://user:pass@host:5432/dbname.PERSISTENT_PATH- The path on the http end point to respond to log requests, defaults to/logs/
Whether to watch the KUBERNETES_LOG_PATH directory for pod logs and forward them.
KUBERNETES- set totrueKUBERNETES_LOG_PATH- optional, the path on each node to look for logs. Defaults to/var/log/containersEXCLUDE_NAMESPACES- optional, a comma separated list of namespaces to ignore
Whether to open a gRPC access log stream end point for istio/envoy to stream http log traffic to.
ENVOY- set totrueENVOY_PORT- The port number to listen for gRPC access log streams (default is9001)
HTTP_EVENTS- set totrueHTTP_EVENTS_PATH- optional, The path on the http server to receive http event payloads, defaults to/events
Note, the port is inherited from HTTP_PORT. The endpoint only allows one per event over the body and must
be the format defined by pkg/output/packet/packet.go.
HTTP_SYSLOG- set totrueHTTP_SYSLOG_PATH- optional, The path on the http server to receive syslog streams as http, defaults to/syslog
Note, the port is inherited from HTTP_PORT.
SYSLOG_TCP- set totrueSYSLOG_TCP_PORT- optional, defaults tcp9002
SYSLOG_UDP- set totrueSYSLOG_UDP_PORT- optional, defaults to9003
SYSLOG_TLS- set totrueSYSLOG_TLS_CERT_PEM- The PEM encoded certificateSYSLOG_TLS_CA_PEM- The PEM encoded certificate authority (optional)SYSLOG_TLS_KEY_PEM- The PEM encoded certificate keySYSLOG_TLS_SERVER_NAME- The servername the TLS server should use for SNISYSLOG_TLS_PORT- optional, defaults to9004
AKKERIS=true- for Akkeris formatting of outputONLY_AKKERIS=true- Optional, ignore any other Kubernetes pods
The logtrain has been tested to be below 64MB (avg 59MB) and < 100m (5%) CPU for 52 pods on a node with 1500+ deployments being watched. For more pods per node or more deployments than the benchmark expect (and reset any limits/requests) for memory.
While targeting a 64MB top limit, logtrain should have a limit of 128MB.
If you receive an error on startup with too many files open error message you'll need to increase
the fs.inotify.max_user_instances and user.max_inotify_instances. These are generally set to
128 by default, depending how many pods are running this may be insufficient.
sysctl -w fs.inotify.max_user_instances=2048
sysctl -w user.max_inotify_instances=2048go build -o logtrain github.com/akkeris/logtrain/cmd/logtrain
go build -o logtail github.com/akkeris/logtrain/cmd/logtailgo test -v .../.(Note if you're using GoConvey its best to set the parallel packges to 1 via -packages 1)
go test -coverprofile cover.out -v ./... && go tool cover -html=cover.out && rm cover.out