本篇文章上接自构建etcd镜像来使用systemd工具利用自构建的etcd镜像快速的搭建一套高可用的etcd集群。
核心配置文件
测试集群使用3节点的etcd集群进行搭建测试,以下为node1节点配置示例,其他两个节点类似,仅需要修改NAME
systemd添加自定义服务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
# cat /etc/etcd/etcd.conf
NAME="etcd-1"
DATADIR="/export/etcd_data"
MYHOST="http://10.0.0.1"
PORT="2379"
CLUSTER_PORT="2380"
CLUSTER="etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380"
CLUSTER_TOKEN="my-etcd-token"
CLUSTER_STATE="new"
# cat /etc/systemd/system/etcd-node.service
[Unit]
Description=etcd node
After=docker.service
Requires=docker.service
[Service]
User=root
EnvironmentFile=-/etc/etcd/etcd.conf
PermissionsStartOnly=true
ExecStart=/usr/bin/docker run -itd --net=host --name=etcd-node -e NAME=${NAME} -e DATADIR=${DATADIR} -e MYHOST=${MYHOST} -e PORT=${PORT} -e CLUSTER_PORT$=${CLUSTER_PORT} -e CLUSTER=${CLUSTER} -e CLUSTER_TOKEN=${CLUSTER_TOKEN} -e CLUSTER_STATE=${CLUSTER_STATE} xxbandy123/etcd:3.0.10
#ExecStop=/usr/bin/docker rm -f etcd-node
#Restart=always
#RestartSec=10
[Install]
WantedBy=multi-user.target
|
注意1
:需要分别修改每个节点上/etc/etcd/etcd.conf配置文件中的NAME和MYHOST、CLUSTER三个变量
注意2
:在编写systemctl服务管理配置的时候,一定不要设置重启策略并且设置ExecStop
,因为初始化集群时需要多个节点同时进行启动并互相发现,当重启某个实例的时候,重新选举注册时,就会发现该节点已经存在与集群中,因此无法正常加入集群,而导致实例启动失败。正常的做法应该是某个实例异常后,先在集群内部摘除该节点,其后将该节点按照当前状态加入到集群后根据相关信息再次启动实例
etcd运维
启动验证
分别启动三个docker实例
1
2
3
4
5
6
7
8
9
10
|
systemctl daemon-reload
systemctl start etcd-node
# etcdctl cluster-health
2017-11-03 09:15:49.006358 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2017-11-03 09:15:49.007093 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
member 911c5e15a35cdb8f is healthy: got healthy result from http://10.0.0.1:2379
member b01d138087dbe547 is healthy: got healthy result from http://10.0.0.3:2379
member c22c1c7b5a4c9f9b is healthy: got healthy result from http://10.0.0.2:2379
cluster is healthy
|
集群维护
模拟集群某个节点宕机,并恢复集群
查看当前集群状态信息以及可用性
1
2
3
4
5
6
7
8
9
10
11
12
|
etcdctl member list
2017-11-04 09:04:14.150103 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
9923f8b86d3ce7a6: name=etcd-1 peerURLs=http://10.0.0.1:2380 clientURLs=http://172.25.44.6:2379 isLeader=false
b01d138087dbe547: name=etcd-3 peerURLs=http://10.0.0.3:2380 clientURLs=http://172.25.47.78:2379 isLeader=false
c22c1c7b5a4c9f9b: name=etcd-2 peerURLs=http://10.0.0.2:2380 clientURLs=http://172.25.47.77:2379 isLeader=true
[root@hc-25-44-6 pe]# etcdctl set bgops biaoge
2017-11-04 09:04:40.234434 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge
# etcdctl get bgops
2017-11-04 09:05:22.059435 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge
|
删除node2节点上的容器实例
1
2
3
4
5
6
7
8
9
10
11
|
# docker rm -f -v etcd-node
# etcdctl cluster-health
member 9923f8b86d3ce7a6 is healthy: got healthy result from http://10.0.0.1:2379
member b01d138087dbe547 is healthy: got healthy result from http://10.0.0.3:2379
failed to check the health of member c22c1c7b5a4c9f9b on http://10.0.0.2:2379: Get http://172.25.47.77:2379/health: dial tcp 172.25.47.77:2379: getsockopt: connection refused
member c22c1c7b5a4c9f9b is unreachable: [http://10.0.0.2:2379] are all unreachable
cluster is healthy
# etcdctl get bgops
2017-11-04 09:10:40.658013 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge
|
如上显示,其中node2节点的已经失联,而由于当前etcd集群是3实例集群,因此集群整体仍然是健康状态,并且能够正常使用
恢复node2节点实例到集群中
注意:如果单纯的去按照原始配置启动node2上的实例的话,会提示无法加入集群(因为实例id已经注册上去了c22c1c7b5a4c9f9b)
1
2
3
|
# docker logs etcd-node
....
2017-11-04 01:12:25.307452 C | etcdmain: member c22c1c7b5a4c9f9b has already been bootstrapped
|
正确的恢复姿势
1.在集群中移除异常节点
2.在集群中增加集群节点(异常节点也作为新节点加入集群)
3.根据集群反馈信息进行异常节点重启
1
2
3
4
5
6
7
8
9
10
11
|
# etcdctl member remove c22c1c7b5a4c9f9b
2017-11-04 09:15:48.488427 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Removed member c22c1c7b5a4c9f9b from cluster
# etcdctl member add etcd-2 http://10.0.0.2:2380
2017-11-04 09:16:50.704247 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Added member named etcd-2 with ID 51e807366fadaded to cluster
ETCD_NAME="etcd-2"
ETCD_INITIAL_CLUSTER="etcd-2=http://10.0.0.2:2380,etcd-1=http://10.0.0.1:2380,etcd-3=http://10.0.0.3:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
|
根据反馈信息进行异常节点恢复(给出了name,cluster,state三个参数)
由于我们是根据原有集群进行恢复节点,所以需要修改node2节点的状态,并启动实例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
# grep STATE /etc/etcd/etcd.conf
CLUSTER_STATE="existing"
# systemctl restart etcd-node
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9fb8b1544642 xxbandy123/etcd:3.0.10 "/docker-entrypoint.s" 33 seconds ago Up 32 seconds etcd-node
# etcdctl cluster-health
2017-11-04 09:23:44.053605 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2017-11-04 09:23:44.054497 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
member 51e807366fadaded is healthy: got healthy result from http://10.0.0.2:2379
member 9923f8b86d3ce7a6 is healthy: got healthy result from http://10.0.0.1:2379
member b01d138087dbe547 is healthy: got healthy result from http://10.0.0.3:2379
cluster is healthy
# etcdctl get bgops
2017-11-04 09:23:38.541355 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge
|
至此,node2节点成功恢复到etcd集群中,并可以提供正常服务
Author
BGBiao
LastMod
2020-11-01
License
原创文章,如需转载请注明文章作者和出处。谢谢!