PART-3 – Monitoring
As we want to know how our ECK is performing, we also want to monitor it with the built in and supplied "Stack Monitoring"
Prepare Filebeat For Monitoring The Cluster
For this setup you can orient at the https://github.com/elastic/cloud-on-k8s/tree/master/config/recipes/beats – As we have our "ELK" in its own namespace and also a different name, we will have to change some lines:
# filebeat may take a while until running in a healthy state
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: filebeat
# change it if needed
namespace: elk
spec:
type: filebeat
version: 7.10.0
# change the references if needed
elasticsearchRef:
name: elk
kibanaRef:
name: kibana
config:
monitoring:
# we use the NOT deprecated internal monitoring feature. it will send the metrics to elasticsearch
enabled: true
# GET / - shows the UUID
# stack monitoring will show you a "Standalone Cluster"
cluster_uuid: n2KDDWUMS2q4h8P5F3_Z8Q
elasticsearch:
hosts: ["https://elk-es-http:9200"]
username: ${MONITORED_ES_USERNAME}
password: ${MONITORED_ES_PASSWORD}
# TODO: use es' ca from secret
ssl.verification_mode: none
filebeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints:
enabled: true
default_config:
type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
templates:
- condition:
contains:
common.k8s.elastic.co/type: elasticsearch
config:
- module: elasticsearch
server:
enabled: true
var.paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
gc:
var.paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
# if you have audit-logging enabled and a proper license (trial or enterprise)
audit:
var.paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
slowlog:
var.paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
deprecation:
var.paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
- condition:
contains:
common.k8s.elastic.co/type: kibana
config:
- module: kibana
log:
var.paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
processors:
- add_cloud_metadata: {}
- add_host_metadata: {}
- drop_event:
# we drop all events with following content(s) in the log-line (in case audit-logging is on and you have the proper license ;)
# this is very rudimentary and will only help to keep it running for the first few hours - you have to set proper configuration via the API
# to restrict what's being logged. Search for elasticsearch audit ignore policies.
# If you do not set ignore-policies, this will start the loop of doom ;) - Elasticsearch creating logs, Filebeat ingesting the logs,
# Elasticsearch creating even more logs, Filebeat cannot keep up and not closing filehandles, ....
when:
and:
- contains:
common.k8s.elastic.co/type: elasticsearch
- or:
- regexp:
message: '"action":".*data/write'
- regexp:
message: '"action":".*data\/read'
- regexp:
message: '"action":".*monitor'
- add_tags:
# we add a nice tag
when:
contains:
or:
- common.k8s.elastic.co/type: elasticsearch
- common.k8s.elastic.co/type: kibana
tags: [ "compliance" ]
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true # Allows to provide richer host metadata
containers:
- name: filebeat
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
- name: data
mountPath: /usr/share/filebeat/data
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: MONITORED_ES_USERNAME
value: elastic
- name: MONITORED_ES_PASSWORD
valueFrom:
secretKeyRef:
key: elastic
# change if needed
name: elk-es-elastic-user
resources:
requests:
cpu: 100m
memory: 1024Mi
limits:
cpu: 100m
memory: 1024Mi
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
path: /var/lib/filebeat-data
type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
verbs:
- get
- watch
- list
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
# change it if needed
namespace: elk
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
# change it if needed
namespace: elk
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
Prepare Metricbeat For Monitoring The Cluster
Metricbeat can use substential resources on larger environments!
Metricbeat will scrape the metrics from given pods if they have a certain label set.
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: metricbeat
namespace: elk
spec:
type: metricbeat
version: 7.10.0
elasticsearchRef:
# on a productive environment, you should use an own monitoring stack.
# here we try to use the same stack
name: elk
config:
monitoring:
# we use the NOT deprecated internal monitoring feature. it will send the metrics to elasticsearch
enabled: true
# GET / - shows in the output the UUID - if you do not define it
# stack monitoring will show you a "Standalone Cluster"
cluster_uuid: n2KDDWUMS2q4h8P5F3_Z8Q
elasticsearch:
hosts: ["https://elk-es-http:9200"]
username: ${MONITORED_ES_USERNAME}
password: ${MONITORED_ES_PASSWORD}
ssl.verification_mode: none
metricbeat:
autodiscover:
providers:
- type: kubernetes
scope: cluster
node: ${NODE_NAME}
hints:
enabled: true
templates:
# this will monitor elasticsearch pods that have this label
- condition:
contains:
kubernetes.labels.stackmonitoring: elasticsearch
config:
- module: elasticsearch
metricsets:
- ccr
- cluster_stats
- enrich
- index
- index_recovery
- index_summary
- ml_job
- node
- node_stats
- pending_tasks
- shard
period: 10s
hosts: "https://${data.host}:9200"
username: ${MONITORED_ES_USERNAME}
password: ${MONITORED_ES_PASSWORD}
# WARNING: disables TLS as the default certificate is not valid for the pod FQDN
# TODO: switch this to "certificate" when available: https://github.com/elastic/beats/issues/8164
ssl.verification_mode: "none"
# so the metrics land in a ".monitoring"-index which will be used by Kibana's "Stack Monitoring"-app
xpack.enabled: true
# monitoring kibana pods
- condition:
contains:
kubernetes.labels.stackmonitoring: kibana
config:
- module: kibana
metricsets:
- stats
- status
period: 10s
hosts: "https://${data.host}:5601"
username: ${MONITORED_ES_USERNAME}
password: ${MONITORED_ES_PASSWORD}
# WARNING: disables TLS as the default certificate is not valid for the pod FQDN
# TODO: switch this to "certificate" when available: https://github.com/elastic/beats/issues/8164
ssl.verification_mode: "none"
xpack.enabled: true
# monitoring logstash pods
- condition:
contains:
kubernetes.labels.stackmonitoring: logstash
config:
- module: logstash
metricsets:
- node
- node_stats
period: 10s
hosts: "http://${data.host}:9600"
#username: ${MONITORED_ES_USERNAME}
#password: ${MONITORED_ES_PASSWORD}
# WARNING: disables TLS as the default certificate is not valid for the pod FQDN
# TODO: switch this to "certificate" when available: https://github.com/elastic/beats/issues/8164
#ssl.verification_mode: "none"
xpack.enabled: true
modules:
- module: system
period: 10s
metricsets:
- cpu
- load
- memory
- network
- process
- process_summary
process:
include_top_n:
by_cpu: 5
by_memory: 5
processes:
- .*
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event:
when:
regexp:
system:
filesystem:
mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib)($|/)
- module: kubernetes
period: 10s
host: ${NODE_NAME}
hosts:
- https://${NODE_NAME}:10250
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl:
verification_mode: none
metricsets:
- node
- system
- pod
- container
- volume
processors:
- add_cloud_metadata: {}
- add_host_metadata: {}
logging.json: true
daemonSet:
podTemplate:
spec:
serviceAccountName: metricbeat
automountServiceAccountToken: true
# required to read /etc/beat.yml
securityContext:
runAsUser: 0
containers:
- args:
- -e
- -c
- /etc/beat.yml
- -system.hostfs=/hostfs
name: metricbeat
volumeMounts:
- mountPath: /hostfs/sys/fs/cgroup
name: cgroup
- mountPath: /var/run/docker.sock
name: dockersock
- mountPath: /hostfs/proc
name: proc
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: MONITORED_ES_USERNAME
value: elastic
- name: MONITORED_ES_PASSWORD
valueFrom:
secretKeyRef:
key: elastic
# change it accordingly to your cluster name
name: elk-es-elastic-user
# metricbeat can peak quite high with its RAM usage - observe it and set proper value. The default of 200Mi is not enough
# after monitoring a few pods
resources:
requests:
cpu: 100m
memory: 2048Mi
limits:
cpu: 100m
memory: 2048Mi
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true # Allows to provide richer host metadata
securityContext:
runAsUser: 0
terminationGracePeriodSeconds: 30
volumes:
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: dockersock
hostPath:
path: /var/run/docker.sock
- name: proc
hostPath:
path: /proc
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: metricbeat
rules:
- apiGroups:
- ""
resources:
- nodes
- namespaces
- events
- pods
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metricbeat
namespace: elk
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metricbeat
subjects:
- kind: ServiceAccount
name: metricbeat
# change it if needed
namespace: elk
roleRef:
kind: ClusterRole
name: metricbeat
apiGroup: rbac.authorization.k8s.io
Kommentare