升级 kubernetes 集群 (1.19.x) -> 1.20.x
1. 升级 kubeadm
apt install kubeadm=1.20.15
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
kubeadm
1 upgraded, 0 newly installed, 0 to remove and 329 not upgraded.
Need to get 7,705 kB of archives.
After this operation, 28.7 kB disk space will be freed.
Get:1 http://mirrors.tencentyun.com/kubernetes/apt kubernetes-xenial/main amd64 kubeadm amd64 1.20.15-00 [7,705 kB]
Fetched 7,705 kB in 1s (9,420 kB/s)
(Reading database ... 118717 files and directories currently installed.)
Preparing to unpack .../kubeadm_1.20.15-00_amd64.deb ...
Unpacking kubeadm (1.20.15-00) over (1.20.5-00) ...
Setting up kubeadm (1.20.15-00) ...
2. 升级前检查
kubeadm upgrade plan
root@l2:~# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.19.16
[upgrade/versions] kubeadm version: v1.20.15
I0810 21:20:17.404258 1175025 version.go:254] remote version is much newer: v1.27.4; falling back to: stable-1.20
[upgrade/versions] Latest stable version: v1.20.15
[upgrade/versions] Latest stable version: v1.20.15
[upgrade/versions] Latest version in the v1.19 series: v1.19.16
[upgrade/versions] Latest version in the v1.19 series: v1.19.16
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT AVAILABLE
kubelet 2 x v1.19.16 v1.20.15
1 x v1.20.5 v1.20.15
Upgrade to the latest stable version:
COMPONENT CURRENT AVAILABLE
kube-apiserver v1.19.16 v1.20.15
kube-controller-manager v1.19.16 v1.20.15
kube-scheduler v1.19.16 v1.20.15
kube-proxy v1.19.16 v1.20.15
CoreDNS 1.7.0 1.7.0
etcd 3.4.13-0 3.4.13-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.20.15
_____________________________________________________________________
The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.
API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io v1alpha1 v1alpha1 no
kubelet.config.k8s.io v1beta1 v1beta1 no
_____________________________________________________________________
2. 升级 worker 节点
2.1 升级 kubeadm
apt install kubeadm=1.20.15-00
kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
2.2 升级 kubelet kubectl
apt install kubelet=1.20.15-00 kubectl=1.20.15-00
sudo systemctl daemon-reload && sudo systemctl restart kubelet
升级 master 节点 kubelet kubectl
apt install kubelet=1.20.15-00 kubectl=1.20.15-00
升级 kubernetes 集群 (1.20.x) -> 1.21.x
步骤与上述类似, 仅列出参考命令
apt install kubeadm=1.21.13-00
apt install kubelet=1.21.13-00 kubectl=1.21.13-00 kubeadm=1.21.13-00
kubeadm upgrade node
sudo systemctl daemon-reload && sudo systemctl restart kubelet
升级 kubernetes 集群 (1.21.x) -> 1.22.x
apt install kubeadm=1.22.10-00
apt install kubelet=1.22.10-00 kubectl=1.22.10-00 kubeadm=1.22.10-00 && kubeadm upgrade node && sudo systemctl daemon-reload && sudo systemctl restart kubelet
升级 kubernetes 集群 (1.22.x) -> 1.23.x
apt install kubeadm=1.23.17-00
apt install kubelet=1.23.17-00 kubectl=1.23.17-00 kubeadm=1.23.17-00 && kubeadm upgrade node && sudo systemctl daemon-reload && sudo systemctl restart kubelet
升级 kubernetes 集群 (1.23.x) -> 1.24.x
1.24 开始已不再支持 docker 运行时了, 需要慎重升级
参考博文 https://www.lisenet.com/2022/upgrading-homelab-kubernetes-cluster-from-1-23-to-1-24/
TODO...
apt install kubeadm=1.24.6-00
apt install kubelet=1.24.16-00 kubectl=1.24.16-00 kubeadm=1.24.16-00 && kubeadm upgrade node && sudo systemctl daemon-reload && sudo systemctl restart kubelet
错误排查
遇到报错: master not ready
升级到 1.22.x 时报错 master not ready
root@l2:~# kg nodes
NAME STATUS ROLES AGE VERSION
l2 NotReady control-plane,master 25h v1.22.10
l3 Ready <none> 24h v1.22.10
l4 Ready <none> 24h v1.22.10
排查系统日志 tail -f /var/log/syslog
Aug 10 22:12:09 l2 kubelet[1218528]: E0810 22:12:09.823103 1218528 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
Aug 10 22:12:12 l2 kubelet[1218528]: I0810 22:12:12.309685 1218528 cni.go:204] "Error validating CNI config list" configList="{\n \"name\": \"cbr0\",\n \"cniVersion\": \"0.3.1\",\n \"plugins\": [\n {\n \"type\": \"flannel\",\n \"delegate\": {\n \"hairpinMode\": true,\n \"isDefaultGateway\": true\n }\n },\n {\n \"type\": \"portmap\",\n \"capabilities\": {\n \"portMappings\": true\n }\n }\n ]\n}\n" err="[failed to find plugin \"flannel\" in path [/opt/cni/bin]]"
发现 flannel plugin 未准备好,
删除 pod, 重新
root@l2:~# kg pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-4kcc6 1/1 Running 0 25h
kube-flannel-ds-6hq7z 1/1 Running 0 25h
kube-flannel-ds-s4jx8 1/1 Running 0 25h
kubectl delete pod kube-flannel-ds-4kcc6 kube-flannel-ds-6hq7z kube-flannel-ds-s4jx8 -n kube-flannel
等待...
升级到 v1.24.15 报错 (尚未解决, 记录备份)
报错日志如下:
[ERROR ImagePull]: failed to pull image registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.24.16: output: time="2023-08-10T22:23:52+08:00" level=fatal msg="validate service connection: CRI v1 image API is not implemented for endpoint \"unix:///var/run/dockershim.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService"
原因: kubernetes 从 1.24 开始, 不再支持 docker 运行时
apt install containerd
root@l2:~# apt install containerd
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
aufs-tools cgroupfs-mount docker-ce-cli pigz
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
runc
The following packages will be REMOVED:
containerd.io docker-ce
The following NEW packages will be installed:
containerd runc
0 upgraded, 2 newly installed, 2 to remove and 330 not upgraded.
Need to get 36.3 MB of archives.
After this operation, 86.6 MB disk space will be freed.
Do you want to continue? [Y/n] y
Get:1 http://mirrors.tencentyun.com/ubuntu focal-updates/main amd64 runc amd64 1.1.7-0ubuntu1~20.04.1 [3,819 kB]
Get:2 http://mirrors.tencentyun.com/ubuntu focal-updates/main amd64 containerd amd64 1.7.2-0ubuntu1~20.04.1 [32.5 MB]
Fetched 36.3 MB in 1s (27.8 MB/s)
(Reading database ... 118718 files and directories currently installed.)
Removing docker-ce (5:19.03.15~3-0~debian-stretch) ...
Removing containerd.io (1.4.3-1) ...
Selecting previously unselected package runc.
(Reading database ... 118696 files and directories currently installed.)
Preparing to unpack .../runc_1.1.7-0ubuntu1~20.04.1_amd64.deb ...
Unpacking runc (1.1.7-0ubuntu1~20.04.1) ...
Selecting previously unselected package containerd.
Preparing to unpack .../containerd_1.7.2-0ubuntu1~20.04.1_amd64.deb ...
Unpacking containerd (1.7.2-0ubuntu1~20.04.1) ...
Setting up runc (1.1.7-0ubuntu1~20.04.1) ...
Setting up containerd (1.7.2-0ubuntu1~20.04.1) ...
Processing triggers for man-db (2.9.1-1) ...
root@l2:~# docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
cat /var/lib/kubelet/kubeadm-flags.env
sudo sed -i 's/--network-plugin=cni/--container-runtime=remote\ --container-runtime-endpoint=unix\:\/\/\/run\/containerd\/containerd.sock/g' /var/lib/kubelet/kubeadm-flags.env
cat /var/lib/kubelet/kubeadm-flags.env
root@l2:~# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.2"
root@l2:~# sudo sed -i 's/--network-plugin=cni/--container-runtime=remote\ --container-runtime-endpoint=unix\:\/\/\/run\/containerd\/containerd.sock/g' /var/lib/kubelet/kubeadm-flags.env
root@l2:~# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.2"
此问题并未解决, 集群机器出了一些问题, 我直接重新安装了,
升级方式可参考 Kubernetes 安装指南