Master nodes will be worker nodes too (resources you know :wink:)
We will use Keepalived and HAproxy as load-balancer for Kubernetes control-traffic. The whole Kubernetes-Cluster won’t use Docker anymore.
Prerequisites
Three machines setup with minimal installation of Fedora 33 Server. I will call them
- vkube-001 (192.168.100.120)
- vkube-002 (192.168.100.121)
- vkube-003 (192.168.100.122)
One VIP (192.168.100.123) which we will use for the load-balancer
All nodes should be able to resolve each other and the VIP – put this for example into the /etc/hosts
192.168.100.120 vkube-001
192.168.100.121 vkube-002
192.168.100.122 vkube-003
192.168.100.123 vip-vkube
Disable swap, firewalld, selinux, cgroupsv2!
If ZRAM-swap is in use you can simple disable it by overriding the default config by
touch /etc/systemd/zram-generator.conf
. Reboot and verify that swap is off.cgroupv2 has to be disabled via:
grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
Add the Kubernetes-repo
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
Installing & Setting Up Keepalived and HAproxy
Parts reused from the very good howto https://www.linuxtechi.com/setup-highly-available-kubernetes-cluster-kubeadm/
Install it simply by executing: dnf install keepalived haproxy
Keepalived
Create a /etc/keepalived/check_apiserver.sh
script (do not forget to make it executable):
#!/bin/sh
APISERVER_VIP=192.168.100.123
APISERVER_DEST_PORT=6443
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi
Setup for example following /etc/keepalived/keepalived.conf
:
########### VRRP CHECK SCRIPT
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
########### VRRP INSTANCE CONFIG
vrrp_instance vip-vkube {
state MASTER
# Define proper interface
interface enp1s0
# Must be unique for each VRRP-instance you define
virtual_router_id 51
# Highest value on master!
priority 200
# Set advertisment interval to 1 second
advert_int 1
authentication {
auth_type PASS
# Only 8 characters are used
auth_pass kadpass1
}
# The IP of the node where keepalived is running on
unicast_src_ip 192.168.100.120
# Set the peers accordingly
unicast_peer {
192.168.100.121
192.168.100.122
}
# The VIP that should move between the hosts in case of a failure
virtual_ipaddress {
192.168.100.123/24
}
track_script {
check_apiserver
}
}
As commented do not forget to configure values properly according where you configure keepalived.conf (MASTER, BACKUP)
HAproxy
Configure HAproxy the /etc/haproxy/haproxy.cfg
should look like:
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# https://www.haproxy.org/download/1.8/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
# utilize system-wide crypto-policies
ssl-default-bind-ciphers PROFILE=SYSTEM
ssl-default-server-ciphers PROFILE=SYSTEM
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:8443 interface enp1s0
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server k8s-master-1 192.168.100.120:6443 check
server k8s-master-2 192.168.100.121:6443 check
server k8s-master-3 192.168.100.122:6443 check
You can now enable and start HAproxy and Keepalived:
systemctl enable haproxy --now
systemctl enable keepalived --now
Install Containerd
Just by executing: dnf install containerd
We also need some additional configurations. Two kernel modules must be loaded:
# Load the modules
modprobe overlay
modprobe br_netfilter
# Make sure they load after a reboot
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
And we need additional sysctl
parameters
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf
# Verify the settings
sysctl -p
Then create a default containerd config:
mv /etc/containerd/config.toml /etc/containerd/config.toml.orig
containerd config default > /etc/containerd/config.toml
Configure in the config.toml
to use the systemd
cgroup driver:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
After this start and enable containerd: systemctl enable containerd --now
Install Kubernetes
On all nodes install kubeadm
, kubelet
and kubectl
dnf install kubeadm kubelet kubectl
Create a folder kubernetes
(as I am working as root it will be in this home-dir) and create e.g. a config.yaml
file with following content on the first node. We will need it for the kubeadm init
command later
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: fu0j0s.qwd21l9wyn9hc9bj
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.100.120
bindPort: 6443
nodeRegistration:
criSocket: /run/containerd/containerd.sock
name: vkube-001
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: vip-vkube:8443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.20.2
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
On the other two nodes we have to create different yaml
-files later
On the first node you can now initiate the first Kubernetes master node:
kubeadm init --upload-certs --config="/root/kubernetes/config.yaml"
Save the output:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join vip-vkube:8443 --token fu0j0s.qwd21l9wyn9hc9bj \
--discovery-token-ca-cert-hash sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d \
--control-plane --certificate-key 66d312f0f27c198dc40fa2497de968de0669fef7cb6ae267a3dd084e62bba523
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join vip-vkube:8443 --token fu0j0s.qwd21l9wyn9hc9bj \
--discovery-token-ca-cert-hash sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d
Exectue the commands mentioned in the output so you can configure the cluster with kubectl
. Now it is time to install a pod network solution – I do prefer Calico
wget https://docs.projectcalico.org/manifests/calico.yaml
# open the file and find CALICO_IPV4POOL_CIDR and edit to be the same like `podSubnet` to use `10.244.0.0/16`
kubectl apply -f calico.yaml
Wait a few seconds, then check the "cluster" state with kubectl get nodes
. The one and only node should be ready after some time.
Then on the next nodes create a join-config.yaml
file – do not forget to edit the controlPlane.localAPIEndpoint.advertiseAddress
and the nodeRegistration.name
depending per node you want to join.
Documentation about creating those config-yaml-files is IMHO very bad at the moment.
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
caCertPath: /etc/kubernetes/pki/ca.crt
controlPlane:
certificateKey: 66d312f0f27c198dc40fa2497de968de0669fef7cb6ae267a3dd084e62bba523
localAPIEndpoint:
advertiseAddress: 192.168.100.121
bindPort: 6443
discovery:
bootstrapToken:
apiServerEndpoint: vip-vkube:8443
caCertHashes:
- sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d
token: fu0j0s.qwd21l9wyn9hc9bj
timeout: 5m0s
tlsBootstrapToken: fu0j0s.qwd21l9wyn9hc9bj
nodeRegistration:
criSocket: /run/containerd/containerd.sock
name: vkube-002
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
Use
--v=2
to see more verbose logs.
kubeadm join --config="/root/kubernetes/join-config.yaml" --v=2
As we want to use those nodes as worker too execute the following command to "untaint" them: kubectl taint nodes --all node-role.kubernetes.io/master-
And we also label them properly so kubectl get nodes
shows everything nicely: kubectl label node --all node-role.kubernetes.io/worker=''
Join Worker Node
Create for example a join-worker-config.yaml
:
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: vip-vkube:8443
token: fu0j0s.qwd21l9wyn9hc9bj
caCertHashes:
- sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d
timeout: 5m0s
tlsBootstrapToken: fu0j0s.qwd21l9wyn9hc9bj
nodeRegistration:
criSocket: /run/containerd/containerd.sock
name: vkube-003
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
Additional Info About kubeadm
Config-Files
You can get some sparse default configs via:
kubeadm config print join-defaults
kubeadm config print init-defaults
Troubleshooting If You Removed A Master And Want To Rejoin It As Master
This issue may have been caused by using Keepalived as loadbalancer (in NAT mode) first. Already was wondering in the beginning about this config I found somewhere else – NAT mode in same subnet? :open_mouth:
In course of testing the worker-node join, I removed the last node (kubectl delete node vkube-003
and then on this node I did a kubeadm reset
) and joined it as worker node, which worked fine.
But after I wanted to rejoin it as master again, I stumbled upon following problem. Somehow kubectl delete node
did not remove the node from the etcd
-config.
I had to first edit the kubeadm-config
and remove the node from there: kubectl edit configmaps -n kube-system kubeadm-config
Additionally I had to install etcdctl
(part of the etcd
-package) and execute following commands:
# get list of etcd members
etcdctl --endpoints https://192.168.100.120:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
# remove member by id
etcdctl --endpoints https://192.168.100.120:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 1dbfb0d823481fd9
Containerd Not Killing Containers After Being Stopped
As I am not a Kubernetes Uber-Pro I am not sure if this is smart :expressionless:
This seems to be default behaviour. Restarting a node will take ages. If you stop kubelet the containers will still be running and even listening on the ports. Keepalived will still balance traffic to the node as port 6443 is still running, kubectl get nodes
will show the node as NotReady
To also stop the containers you need to edit the containerd
-systemd unit file it seems and override the KillMode
– do this by executing systemctl edit containerd
and paste for example this content.
[Service]
# Default - seems to let containers run even after containerd is stopped
# as commented in https://github.com/containerd/containerd/issues/386#issuecomment-304837687
# mixed should also kill containers
# process is default
#KillMode=process
KillMode=mixed
So to completely stop a node temporarily you now can do:
systemctl stop kubelet
systemctl stop containerd
To get the node back online
systemctl start containerd
systemctl start kubelet
Exploring Containerd
Only a short grasp into
containerd
As this setup is not using Docker as CRI anymore, you can check out things run by containerd
with the ctr
-command
For example:
- list namespaces:
ctr namespaces list
- list images available to k8s:
ctr --namespace k8s.io images list
- list running containers:
ctr --namespace k8s.io containers list
Kommentare