Master nodes will be worker nodes too (resources you know :wink:)

We will use Keepalived and HAproxy as load-balancer for Kubernetes control-traffic. The whole Kubernetes-Cluster won’t use Docker anymore.

Prerequisites

Three machines setup with minimal installation of Fedora 33 Server. I will call them

  • vkube-001 (192.168.100.120)
  • vkube-002 (192.168.100.121)
  • vkube-003 (192.168.100.122)

One VIP (192.168.100.123) which we will use for the load-balancer

All nodes should be able to resolve each other and the VIP – put this for example into the /etc/hosts

192.168.100.120 vkube-001
192.168.100.121 vkube-002
192.168.100.122 vkube-003
192.168.100.123 vip-vkube

Disable swap, firewalld, selinux, cgroupsv2!

If ZRAM-swap is in use you can simple disable it by overriding the default config by touch /etc/systemd/zram-generator.conf. Reboot and verify that swap is off.

cgroupv2 has to be disabled via: grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

Add the Kubernetes-repo

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

Installing & Setting Up Keepalived and HAproxy

Parts reused from the very good howto https://www.linuxtechi.com/setup-highly-available-kubernetes-cluster-kubeadm/

Install it simply by executing: dnf install keepalived haproxy

Keepalived

Create a /etc/keepalived/check_apiserver.sh script (do not forget to make it executable):

#!/bin/sh
APISERVER_VIP=192.168.100.123
APISERVER_DEST_PORT=6443

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
    curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi

Setup for example following /etc/keepalived/keepalived.conf:

########### VRRP CHECK SCRIPT
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  weight -2
  fall 10
  rise 2
}

########### VRRP INSTANCE CONFIG
vrrp_instance vip-vkube {
    state MASTER

    # Define proper interface
    interface enp1s0

    # Must be unique for each VRRP-instance you define
    virtual_router_id 51

    # Highest value on master!
    priority 200

    # Set advertisment interval to 1 second
    advert_int 1

    authentication {
        auth_type PASS
        # Only 8 characters are used
        auth_pass kadpass1
    }

    # The IP of the node where keepalived is running on
    unicast_src_ip 192.168.100.120

    # Set the peers accordingly
    unicast_peer {
        192.168.100.121
        192.168.100.122
    }

    # The VIP that should move between the hosts in case of a failure
    virtual_ipaddress {
        192.168.100.123/24
    }

    track_script {
        check_apiserver
    }
}

As commented do not forget to configure values properly according where you configure keepalived.conf (MASTER, BACKUP)

HAproxy

Configure HAproxy the /etc/haproxy/haproxy.cfg should look like:

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   https://www.haproxy.org/download/1.8/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

    # utilize system-wide crypto-policies
    ssl-default-bind-ciphers PROFILE=SYSTEM
    ssl-default-server-ciphers PROFILE=SYSTEM

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
    bind *:8443 interface enp1s0
    mode tcp
    option tcplog
    default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance     roundrobin
        server k8s-master-1 192.168.100.120:6443 check
        server k8s-master-2 192.168.100.121:6443 check
        server k8s-master-3 192.168.100.122:6443 check

You can now enable and start HAproxy and Keepalived:

systemctl enable haproxy --now
systemctl enable keepalived --now

Install Containerd

Just by executing: dnf install containerd

We also need some additional configurations. Two kernel modules must be loaded:

# Load the modules
modprobe overlay
modprobe br_netfilter

# Make sure they load after a reboot
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

And we need additional sysctl parameters

echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf
# Verify the settings
sysctl -p

Then create a default containerd config:

mv /etc/containerd/config.toml /etc/containerd/config.toml.orig
containerd config default > /etc/containerd/config.toml

Configure in the config.toml to use the systemd cgroup driver:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

After this start and enable containerd: systemctl enable containerd --now

Install Kubernetes

On all nodes install kubeadm, kubelet and kubectl

dnf install kubeadm kubelet kubectl

Create a folder kubernetes (as I am working as root it will be in this home-dir) and create e.g. a config.yaml file with following content on the first node. We will need it for the kubeadm init command later

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: fu0j0s.qwd21l9wyn9hc9bj
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.100.120
  bindPort: 6443
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  name: vkube-001
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: vip-vkube:8443
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.20.2
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

On the other two nodes we have to create different yaml-files later

On the first node you can now initiate the first Kubernetes master node:

kubeadm init --upload-certs --config="/root/kubernetes/config.yaml"

Save the output:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join vip-vkube:8443 --token fu0j0s.qwd21l9wyn9hc9bj \
    --discovery-token-ca-cert-hash sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d \
    --control-plane --certificate-key 66d312f0f27c198dc40fa2497de968de0669fef7cb6ae267a3dd084e62bba523

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join vip-vkube:8443 --token fu0j0s.qwd21l9wyn9hc9bj \
    --discovery-token-ca-cert-hash sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d

Exectue the commands mentioned in the output so you can configure the cluster with kubectl. Now it is time to install a pod network solution – I do prefer Calico

wget https://docs.projectcalico.org/manifests/calico.yaml
# open the file and find CALICO_IPV4POOL_CIDR and edit to be the same like `podSubnet` to use `10.244.0.0/16`
kubectl apply -f calico.yaml

Wait a few seconds, then check the "cluster" state with kubectl get nodes. The one and only node should be ready after some time.

Then on the next nodes create a join-config.yaml file – do not forget to edit the controlPlane.localAPIEndpoint.advertiseAddress and the nodeRegistration.name depending per node you want to join.

Documentation about creating those config-yaml-files is IMHO very bad at the moment.

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
caCertPath: /etc/kubernetes/pki/ca.crt
controlPlane:
  certificateKey: 66d312f0f27c198dc40fa2497de968de0669fef7cb6ae267a3dd084e62bba523
  localAPIEndpoint:
    advertiseAddress: 192.168.100.121
    bindPort: 6443
discovery:
  bootstrapToken:
    apiServerEndpoint: vip-vkube:8443
    caCertHashes:
    - sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d
    token: fu0j0s.qwd21l9wyn9hc9bj
  timeout: 5m0s
  tlsBootstrapToken: fu0j0s.qwd21l9wyn9hc9bj
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  name: vkube-002
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

Use --v=2 to see more verbose logs.

kubeadm join --config="/root/kubernetes/join-config.yaml" --v=2

As we want to use those nodes as worker too execute the following command to "untaint" them: kubectl taint nodes --all node-role.kubernetes.io/master-

And we also label them properly so kubectl get nodes shows everything nicely: kubectl label node --all node-role.kubernetes.io/worker=''

Join Worker Node

Create for example a join-worker-config.yaml:

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
  bootstrapToken:
    apiServerEndpoint: vip-vkube:8443
    token: fu0j0s.qwd21l9wyn9hc9bj
    caCertHashes:
    - sha256:1ba22cc522fcb76991496ccb648c3bf0e8251da6f79dea64beea080ecbdd806d
  timeout: 5m0s
  tlsBootstrapToken: fu0j0s.qwd21l9wyn9hc9bj
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  name: vkube-003
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

Additional Info About kubeadm Config-Files

You can get some sparse default configs via:

  • kubeadm config print join-defaults
  • kubeadm config print init-defaults

Troubleshooting If You Removed A Master And Want To Rejoin It As Master

This issue may have been caused by using Keepalived as loadbalancer (in NAT mode) first. Already was wondering in the beginning about this config I found somewhere else – NAT mode in same subnet? :open_mouth:

In course of testing the worker-node join, I removed the last node (kubectl delete node vkube-003 and then on this node I did a kubeadm reset) and joined it as worker node, which worked fine.

But after I wanted to rejoin it as master again, I stumbled upon following problem. Somehow kubectl delete node did not remove the node from the etcd-config.

I had to first edit the kubeadm-config and remove the node from there: kubectl edit configmaps -n kube-system kubeadm-config

Additionally I had to install etcdctl (part of the etcd-package) and execute following commands:

# get list of etcd members
etcdctl --endpoints https://192.168.100.120:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
# remove member by id
etcdctl --endpoints https://192.168.100.120:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 1dbfb0d823481fd9

Containerd Not Killing Containers After Being Stopped

As I am not a Kubernetes Uber-Pro I am not sure if this is smart :expressionless:

This seems to be default behaviour. Restarting a node will take ages. If you stop kubelet the containers will still be running and even listening on the ports. Keepalived will still balance traffic to the node as port 6443 is still running, kubectl get nodes will show the node as NotReady

To also stop the containers you need to edit the containerd-systemd unit file it seems and override the KillMode – do this by executing systemctl edit containerd and paste for example this content.

[Service]
# Default - seems to let containers run even after containerd is stopped
# as commented in https://github.com/containerd/containerd/issues/386#issuecomment-304837687
# mixed should also kill containers
# process is default
#KillMode=process
KillMode=mixed

So to completely stop a node temporarily you now can do:

systemctl stop kubelet
systemctl stop containerd

To get the node back online

systemctl start containerd
systemctl start kubelet

Exploring Containerd

Only a short grasp into containerd

As this setup is not using Docker as CRI anymore, you can check out things run by containerd with the ctr-command

For example:

  • list namespaces: ctr namespaces list
  • list images available to k8s: ctr --namespace k8s.io images list
  • list running containers: ctr --namespace k8s.io containers list
Zuletzt bearbeitet: Februar 10, 2021

Autor

Kommentare

Kommentar verfassen

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.