基于kubeadm部署Kubernetes 1.29集群(CentOS 7.9最终版)
一、环境准备
1.1 操作系统要求
操作系统:CentOS 7.9(最小化安装,已配置基础网络)
内核要求:3.10.0-1160.el7.x86_64及以上(执行uname -r验证,低于此版本需升级内核)
1.2 集群规划
| k8s-master | 192.168.56.100 | 4G及以上 | 50G及以上 | Master节点 |
| k8s-worker-01 | 192.168.56.101 | 4G及以上 | 50G及以上 | Worker节点 |
| k8s-worker-02 | 192.168.56.102 | 4G及以上 | 50G及以上 | Worker节点 |
1.3 软件版本清单
| Kubernetes | 1.29.0 | 核心集群组件 |
| Docker CE | 26.1.4 | 容器运行时(需锁定版本) |
| cri-dockerd | 0.3.8 | Docker与K8s的适配桥梁 |
| Calico | 3.24.1 | 容器网络插件(适配10.244.0.0/16网段) |
二、所有节点基础配置(必执行)
说明:以下操作需在Master和所有Worker节点执行,建议使用root用户操作,避免权限问题。
2.1 主机名配置
Master节点
hostnamectl set-hostname k8s-master && bash
Worker01节点
hostnamectl set-hostname k8s-worker-01 && bash
Worker02节点
hostnamectl set-hostname k8s-worker-02 && bash
2.2 主机名映射
vi /etc/hosts
新增以下内容(所有节点一致):
192.168.56.100 k8s-master
192.168.56.101 k8s-worker-01
192.168.56.102 k8s-worker-02
验证:执行ping k8s-master,能正常连通即可。
2.3 关闭防火墙
# 关闭并禁用防火墙
systemctl disable firewalld && systemctl stop firewalld
# 验证状态(输出not running)
firewall-cmd –state
2.4 关闭SELinux
# 临时关闭
setenforce 0
# 永久关闭(修改配置文件)
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
# 重启系统生效(必须重启)
reboot
重启后验证:执行sestatus,输出SELinux status: disabled即可。
2.5 配置可用yum源(解决镜像源解析失败问题)
说明:先修复yum源,避免因无法解析镜像源导致安装失败,适配CentOS 7系统。
# 1. 删除原有无效yum源
rm -rf /etc/yum.repos.d/*
# 2. 配置阿里云CentOS 7可用yum源(国内稳定)
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
# 3. 清理无效镜像地址,保留可用节点
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
# 4. 生成yum缓存,更新源索引
yum clean all && yum makecache
# 5. 验证yum源是否可用(无报错即正常)
yum repolist
2.6 时间同步配置(彻底清理+全新部署)
说明:基于可用yum源,彻底清理原有chrony残留,全新安装配置,适配集群192.168.56.0/24网段。
第一步:彻底卸载并清理chrony残留文件
# 1. 停止chronyd服务
systemctl stop chronyd
# 2. 禁用chronyd开机自启
systemctl disable chronyd
# 3. 彻底卸载chrony
yum remove -y chrony
# 4. 删除残留配置文件
rm -rf /etc/chrony.conf /var/lib/chrony/*
# 5. 清理相关日志(适配CentOS 7)
systemctl restart systemd-journald
journalctl –vacuum-time=1s
第二步:重新安装并配置chrony
# 1. 重新安装chrony
yum install -y chrony
# 2. 直接创建干净的chrony.conf配置文件
cat > /etc/chrony.conf << EOF
# 阿里云NTP服务器(多配置提高可用性)
server ntp1.aliyun.com iburst
server ntp2.aliyun.com iburst
server ntp3.aliyun.com iburst
server ntp4.aliyun.com iburst
# 允许192.168.56.0/24网段节点同步时间
allow 192.168.56.0/24
# 无法连接外部NTP时,使用本地时间备用
local stratum 10
# 日志相关配置
logdir /var/log/chrony
log measurements statistics tracking
EOF
# 3. 启动chronyd服务并设置开机自启
systemctl start chronyd
systemctl enable chronyd
第三步:验证部署结果
# 1. 查看chronyd服务状态(active (running)即正常)
systemctl status chronyd -l
# 2. 查看NTP服务器连接状态(*号表示已同步)
chronyc sources -v
# 3. 手动触发时间同步
chronyc makestep
# 4. 查看当前系统时间
date
2.7 内核参数配置(路由转发+网桥过滤)
# 创建内核配置文件
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0 # 禁止使用swap分区
EOF
# 生效内核参数
sysctl –system
# 加载br_netfilter模块
modprobe br_netfilter
# 验证模块加载(有输出即成功)
lsmod | grep br_netfilter
2.8 IPVS转发配置
# 安装ipvsadm工具
yum install -y ipvsadm
# 创建模块加载文件
mkdir -p /etc/sysconfig/ipvsadm
cat > /etc/sysconfig/ipvsadm/ipvs.modules << EOF
#!/bin/bash
modprobe — ip_vs
modprobe — ip_vs_rr
modprobe — ip_vs_wrr
modprobe — ip_vs_sh
modprobe — nf_conntrack_ipv4 # CentOS 7内核适配
EOF
# 授权并执行模块加载
chmod 755 /etc/sysconfig/ipvsadm/ipvs.modules && bash /etc/sysconfig/ipvsadm/ipvs.modules
# 验证加载(有对应输出即成功)
lsmod | grep -e ip_vs -e nf_conntrack
2.9 关闭Swap分区
# 临时关闭swap
swapoff -a
# 永久关闭(注释swap相关行)
sed -i '/swap/s/^/#/' /etc/fstab
# 验证(无输出即成功)
grep swap /etc/fstab
三、所有节点安装Docker与cri-dockerd
3.1 安装Docker CE 26.1.4
# 安装依赖工具
yum install -y yum-utils device-mapper-persistent-data lvm2 vim net-tools wget
# 配置阿里云Docker yum源
yum-config-manager –add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
# 安装指定版本Docker(仅安装docker-ce,锁定26.1.4)
yum install -y docker-ce-26.1.4
3.2 配置Docker镜像源
说明:仅配置指定镜像源,无额外冗余配置,确保Docker稳定启动。
# 更换源(使用指定镜像列表)
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": [
"https://docker.hpcloud.cloud",
"https://docker.m.daocloud.io",
"https://docker.unsee.tech",
"https://docker.1panel.live",
"http://mirrors.ustc.edu.cn",
"https://docker.chenby.cn",
"http://mirror.azure.cn",
"https://dockerpull.org",
"https://dockerhub.icu",
"https://hub.rat.dev",
"https://proxy.1panel.live",
"https://docker.1panel.top",
"https://docker.1ms.run",
"https://docker.ketches.cn"
]
}
EOF
# 重启Docker使配置生效并设置开机自启
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
# 验证Docker状态(active (running)即成功)
systemctl status docker -l
3.3 安装cri-dockerd 0.3.8
# 下载cri-dockerd RPM包
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.8/cri-dockerd-0.3.8-3.el7.x86_64.rpm
# 安装cri-dockerd
yum -y install cri-dockerd-0.3.8-3.el7.x86_64.rpm
# 修改cri-docker服务配置(适配国内pause镜像)
sed -i 's#^ExecStart=/usr/bin/cri-dockerd –container-runtime-endpoint fd://#ExecStart=/usr/bin/cri-dockerd –pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9 –container-runtime-endpoint fd://#g' /usr/lib/systemd/system/cri-docker.service
# 重新加载服务配置并启动
systemctl daemon-reload && systemctl start cri-docker && systemctl enable cri-docker
# 验证状态(active (running)即成功)
systemctl status cri-docker
四、所有节点安装K8s组件(kubelet/kubeadm/kubectl)
# 【配置kubernetes的yum源】
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/repodata/repomd.xml.key
EOF
# 【使用yum方式安装】
yum install -y kubelet kubeadm kubectl
# 【配置 cgroup 驱动与docker一致】
cat > /etc/sysconfig/kubelet <<EOF
KUBELET_EXTRA_ARGS="–cgroup-driver=systemd"
EOF
# 【设置kubelet为开机自启动】
systemctl daemon-reload && systemctl enable kubelet
五、集群部署(Master节点执行核心操作)
5.1 拉取K8s核心镜像
创建镜像拉取脚本,采用数组存储镜像列表,避免解析异常,适配bash环境:
# 创建脚本(数组存储镜像,解决解析异常,增强报错提示)
cat > images_download_k8s.sh << EOF
#!/bin/bash
# 阿里云镜像列表(数组存储,彻底解决换行/空格解析问题)
images_list=(
"registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0"
"registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0"
"registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0"
"registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0"
"registry.aliyuncs.com/google_containers/coredns:v1.11.1"
"registry.aliyuncs.com/google_containers/pause:3.9"
"registry.aliyuncs.com/google_containers/etcd:3.5.12-0"
)
# 循环拉取镜像(数组遍历,无解析异常,拉取失败立即退出)
echo "开始拉取K8s核心镜像(共7个)…"
for image in "\\${images_list[@]}"
do
echo "正在拉取:\\$image"
docker pull "\\$image"
# 拉取失败立即退出,明确报错
if [ \\$? -ne 0 ]; then
echo "ERROR:镜像拉取失败,请检查网络或镜像地址:\\$image"
exit 1
fi
done
# 标签映射(适配K8s默认镜像名称,确保对应正确)
echo -e "\\n开始配置镜像标签映射…"
docker tag registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0 registry.k8s.io/kube-apiserver:v1.29.0
docker tag registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0 registry.k8s.io/kube-controller-manager:v1.29.0
docker tag registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0 registry.k8s.io/kube-scheduler:v1.29.0
docker tag registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0 registry.k8s.io/kube-proxy:v1.29.0
docker tag registry.aliyuncs.com/google_containers/etcd:3.5.12-0 registry.k8s.io/etcd:3.5.12-0
docker tag registry.aliyuncs.com/google_containers/coredns:v1.11.1 registry.k8s.io/coredns/coredns:v1.11.1
docker tag registry.aliyuncs.com/google_containers/pause:3.9 registry.k8s.io/pause:3.9
# 可选:删除阿里云源镜像,节省空间(取消注释即可启用)
# echo -e "\\n开始清理原阿里云镜像…"
# for image in "\\${images_list[@]}"
# do
# docker rmi "\\$image"
# done
echo -e "\\n镜像拉取及标签配置完成!"
EOF
# 授权并执行脚本(单独执行便于排查报错)
chmod +x images_download_k8s.sh
echo "执行镜像拉取脚本…"
sh images_download_k8s.sh
#!/bin/bash
# 验证K8s核心镜像(修复变量引用,确保计数准确,适配bash)
echo -e "\\n==================== 镜像验证结果 ====================="
# 正确获取镜像数量,移除多余的转义符
image_count=$(docker images | grep registry.k8s.io | wc -l)
# 显示所有K8s核心镜像详情
echo -e "\\n【已安装的K8s核心镜像列表】"
docker images | grep registry.k8s.io
# 正确输出镜像数量(移除变量前的转义符)
echo -e "\\n共检测到 $image_count 个K8s核心镜像(需为7个)"
# 整数判断,确保变量正确解析
if [ "$image_count" -eq 7 ]; then
echo -e "\\n✅ 镜像验证通过,可继续执行集群初始化!"
else
echo -e "\\n❌ 警告:镜像数量异常(实际 $image_count 个),可能拉取或标签配置失败!"
echo "请检查:1. 网络是否能访问阿里云镜像源 2. 脚本执行日志是否有拉取报错"
fi
重要:执行完成后建议给Master节点创建快照,避免部署失败需重新配置。
5.2 集群初始化(master节点)
kubeadm init –kubernetes-version=v1.29.0 \\
–pod-network-cidr=10.244.0.0/16 \\
–service-cidr=10.96.0.0/12 \\
–apiserver-advertise-address=192.168.56.100 \\
–cri-socket=unix:///var/run/cri-dockerd.sock \\
–image-repository=registry.aliyuncs.com/google_containers
初始化成功后,会输出Worker节点加入集群的命令,复制保存(后续Worker节点使用),格式如下(实际token和hash以自身输出为准):
kubeadm join 192.168.56.100:6443 –token 5x76cq.zhm4xb91kgg8wwsx \\
–discovery-token-ca-cert-hash sha256:4168af487686e7cd26f6f8034ac831beeca3f6ecfae550651d5ab6fd79acfe72 \\
–cri-socket=unix:///var/run/cri-dockerd.sock
5.3 配置kubectl命令行工具
# 创建kubectl配置目录
mkdir -p $HOME/.kube
# 复制配置文件
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# 授权
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 生效环境变量(root用户可执行)
export KUBECONFIG=/etc/kubernetes/admin.conf
# 验证kubectl(输出集群信息即成功)
kubectl cluster-info
5.4 部署Calico网络插件
说明:修复原镜像源适配问题,采用Docker官方国内镜像+腾讯云双源拉取,无需登录,适配实操场景。所有节点执行,Master先执行。
# ========== 方式一:在线拉取Calico官方镜像(所有节点都要执行,Master/Worker) ==========
# Master节点执行(Worker节点单独复制这5行执行)
docker pull docker.io/calico/cni:v3.24.1
docker pull docker.io/calico/pod2daemon-flexvol:v3.24.1
docker pull docker.io/calico/node:v3.24.1
docker pull docker.io/calico/kube-controllers:v3.24.1
docker pull docker.io/calico/typha:v3.24.1
# ========== 方式二:tar包导入镜像(离线环境使用,所有节点都要执行,Master/Worker) ==========
# 说明:需提前将对应tar包上传至节点任意目录(如/root目录),进入tar包所在目录后执行以下命令
# 加载 calico/cni 镜像
docker load -i calico_cni_v3.24.1.tar
# 加载 calico/kube-controllers 镜像
docker load -i calico_kube-controllers_v3.24.1.tar
# 加载 calico/node 镜像
docker load -i calico_node_v3.24.1.tar
# 加载 calico/pod2daemon-flexvol 镜像
docker load -i calico_pod2daemon-flexvol_v3.24.1.tar
# 加载 calico/typha 镜像
docker load -i calico_typha_v3.24.1.tar
# 为镜像打标签(适配Calico配置文件默认镜像名称,确保部署正常)
docker tag docker.io/calico/cni:v3.24.1 calico/cni:v3.24.1
docker tag docker.io/calico/pod2daemon-flexvol:v3.24.1 calico/pod2daemon-flexvol:v3.24.1
docker tag docker.io/calico/node:v3.24.1 calico/node:v3.24.1
docker tag docker.io/calico/kube-controllers:v3.24.1 calico/kube-controllers:v3.24.1
docker tag docker.io/calico/typha:v3.24.1 calico/typha:v3.24.1
# ========== 部署Calico配置(两种镜像获取方式共用,仅Master节点执行) ==========
# 下载官方配置文件并修改Pod网段(与k8s初始化网段一致)
wget https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/calico.yaml -O /root/calico.yaml
sed -i 's/192.168.0.0\\/16/10.244.0.0\\/16/g' /root/calico.yaml
# 部署Calico
kubectl apply -f /root/calico.yaml
# 查看部署后的状态
echo -e "\\nCalico Pod状态(所有Pod为Running即正常):"
kubectl get pods -n kube-system | grep calico
六、Worker节点加入集群
确保Worker节点已完成前面“所有节点”的所有配置(基础配置、Docker、cri-dockerd、K8s组件、镜像拉取);
执行Master节点初始化成功后保存的join命令(若命令丢失,在Master节点重新生成):
# Master节点重新生成join命令(有效期永久)
kubeadm token create –ttl 0 –print-join-command
# Worker节点执行join命令(替换为实际生成的命令,添加cri-socket参数)
kubeadm join 192.168.56.100:6443 –token xxxx.xxxx \\
–discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \\
–cri-socket=unix:///var/run/cri-dockerd.sock
# 查看节点状态(所有节点为Ready即成功,NotReady需等待Calico部署完成)
kubectl get nodes
# 实际成功输出示例(与实操结果一致):
# NAME STATUS ROLES AGE VERSION
# k8s-master Ready control-plane 57m v1.29.15
# k8s-worker-01 Ready <none> 54m v1.29.15
# 给Worker节点打标签(可选,便于管理,标签后显示worker角色)
kubectl label node k8s-worker-01 node-role.kubernetes.io/worker=worker
kubectl label node k8s-worker-02 node-role.kubernetes.io/worker=worker
# 标签后验证(Worker节点角色生效)
kubectl get nodes
# 标签后输出示例:
# NAME STATUS ROLES AGE VERSION
# k8s-master Ready control-plane 58m v1.29.15
# k8s-worker-01 Ready worker 55m v1.29.15
# 查看节点详细信息,确认无异常事件
kubectl describe node k8s-worker-01 | grep -A 10 "Conditions"
# 若存在其他Worker节点(如k8s-worker-02),重复执行join命令和标签操作即可
说明:若执行join命令后输出类似以下内容,即表示节点加入成功:
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
实操验证结果:执行join命令后,在Master节点执行kubectl get nodes,已成功识别k8s-worker-01节点,且两者均处于Ready状态,说明:1. 节点加入集群成功;2. Calico网络插件部署生效,节点网络连通正常;3. 集群基础组件运行稳定,可进入后续可用性验证环节。
成功后只需在Master节点执行kubectl get nodes确认节点状态,待所有节点变为Ready后,即可进行集群可用性验证。
# 查看节点状态(所有节点为Ready即成功,NotReady需等待Calico部署完成)
kubectl get nodes
# 给Worker节点打标签(可选,便于管理)
kubectl label node k8s-worker-01 node-role.kubernetes.io/worker=worker
kubectl label node k8s-worker-02 node-role.kubernetes.io/worker=worker
# 再次查看(Worker节点显示worker角色)
kubectl get nodes -o wide
七、集群可用性验证
7.1 基础状态验证
# 1. 查看集群组件健康状态
kubectl get cs
# 提示:v1 ComponentStatus已废弃,可忽略警告,状态为Healthy即正常
# 2. 查看系统Pod状态(所有Pod均为Running)
kubectl get pods -n kube-system -o wide
7.2 部署Nginx测试Pod网络
# 1.拉取镜像(防止直接部署的时候拉取镜像失败导致部署失败)
docker pull nginx
# 2. 创建Nginx Deployment(2个副本)
kubectl create deployment nginx –image=nginx:1.24 –replicas=2
# 3. 暴露为NodePort服务(外部可访问)
kubectl expose deployment nginx –port=80 –type=NodePort
# 4. 查看Deployment和Service状态
kubectl get deployments,svc
# 输出示例:service/nginx NodePort 10.106.52.54 <none> 80:31423/TCP 30s
# 其中31423为NodePort端口(随机生成,记录自身端口)
# 5. 测试访问(任意节点IP+NodePort)
curl 192.168.56.100:31423 # Master节点IP+端口
curl 192.168.56.101:31423 # Worker01节点IP+端口
# 输出Nginx欢迎页即成功,说明Pod网络、Service均正常
# 5. 清理测试资源(可选)
kubectl delete deployment nginx
kubectl delete svc nginx
八、集群重置(部署失败时)
# 所有节点执行重置
kubeadm reset -f
# 删除网络配置和kubectl配置
rm -rf /etc/cni/net.d
rm -rf $HOME/.kube
# 重启kubelet和cri-docker
systemctl restart kubelet cri-docker
# Master节点重新初始化即可
九、注意事项
所有节点网络必须互通,确保192.168.56.0/24网段内节点可正常通信(已关闭防火墙,无需额外配置)。
镜像拉取若失败,检查网络是否能访问阿里云、Docker官方、腾讯云镜像源,本文脚本已采用多源保障,可直接复用。
集群部署完成后,建议备份Master节点的/etc/kubernetes目录和$HOME/.kube/config文件,用于集群恢复。
CentOS 7.9系统需确保已安装所有补丁(yum update -y),避免内核漏洞导致的问题。
网硕互联帮助中心


评论前必须登录!
注册