本篇文章上接自构建etcd镜像来使用systemd工具利用自构建的etcd镜像快速的搭建一套高可用的etcd集群。

核心配置文件

测试集群使用3节点的etcd集群进行搭建测试,以下为node1节点配置示例,其他两个节点类似,仅需要修改NAME systemd添加自定义服务

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# cat /etc/etcd/etcd.conf
NAME="etcd-1"
DATADIR="/export/etcd_data"
MYHOST="http://10.0.0.1"
PORT="2379"
CLUSTER_PORT="2380"
CLUSTER="etcd-1=http://10.0.0.1:2380,etcd-2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380"
CLUSTER_TOKEN="my-etcd-token"
CLUSTER_STATE="new"

# cat /etc/systemd/system/etcd-node.service
[Unit]
Description=etcd node
After=docker.service
Requires=docker.service

[Service]
User=root
EnvironmentFile=-/etc/etcd/etcd.conf
PermissionsStartOnly=true
ExecStart=/usr/bin/docker run -itd --net=host   --name=etcd-node -e NAME=${NAME} -e DATADIR=${DATADIR} -e MYHOST=${MYHOST} -e PORT=${PORT} -e CLUSTER_PORT$=${CLUSTER_PORT} -e CLUSTER=${CLUSTER} -e CLUSTER_TOKEN=${CLUSTER_TOKEN} -e CLUSTER_STATE=${CLUSTER_STATE} xxbandy123/etcd:3.0.10

#ExecStop=/usr/bin/docker rm -f etcd-node
#Restart=always
#RestartSec=10

[Install]
WantedBy=multi-user.target

注意1:需要分别修改每个节点上/etc/etcd/etcd.conf配置文件中的NAME和MYHOST、CLUSTER三个变量 注意2:在编写systemctl服务管理配置的时候,一定不要设置重启策略并且设置ExecStop,因为初始化集群时需要多个节点同时进行启动并互相发现,当重启某个实例的时候,重新选举注册时,就会发现该节点已经存在与集群中,因此无法正常加入集群,而导致实例启动失败。正常的做法应该是某个实例异常后,先在集群内部摘除该节点,其后将该节点按照当前状态加入到集群后根据相关信息再次启动实例 etcd运维

启动验证

分别启动三个docker实例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
systemctl daemon-reload
systemctl start etcd-node

# etcdctl cluster-health
2017-11-03 09:15:49.006358 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2017-11-03 09:15:49.007093 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
member 911c5e15a35cdb8f is healthy: got healthy result from http://10.0.0.1:2379
member b01d138087dbe547 is healthy: got healthy result from http://10.0.0.3:2379
member c22c1c7b5a4c9f9b is healthy: got healthy result from http://10.0.0.2:2379
cluster is healthy

集群维护

模拟集群某个节点宕机,并恢复集群

查看当前集群状态信息以及可用性

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
etcdctl member list
2017-11-04 09:04:14.150103 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
9923f8b86d3ce7a6: name=etcd-1 peerURLs=http://10.0.0.1:2380 clientURLs=http://172.25.44.6:2379 isLeader=false
b01d138087dbe547: name=etcd-3 peerURLs=http://10.0.0.3:2380 clientURLs=http://172.25.47.78:2379 isLeader=false
c22c1c7b5a4c9f9b: name=etcd-2 peerURLs=http://10.0.0.2:2380 clientURLs=http://172.25.47.77:2379 isLeader=true
[root@hc-25-44-6 pe]# etcdctl set bgops biaoge
2017-11-04 09:04:40.234434 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge

# etcdctl get bgops
2017-11-04 09:05:22.059435 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge

删除node2节点上的容器实例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# docker rm -f -v etcd-node
# etcdctl cluster-health
member 9923f8b86d3ce7a6 is healthy: got healthy result from http://10.0.0.1:2379
member b01d138087dbe547 is healthy: got healthy result from http://10.0.0.3:2379
failed to check the health of member c22c1c7b5a4c9f9b on http://10.0.0.2:2379: Get http://172.25.47.77:2379/health: dial tcp 172.25.47.77:2379: getsockopt: connection refused
member c22c1c7b5a4c9f9b is unreachable: [http://10.0.0.2:2379] are all unreachable
cluster is healthy

# etcdctl get bgops
2017-11-04 09:10:40.658013 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge

如上显示,其中node2节点的已经失联,而由于当前etcd集群是3实例集群,因此集群整体仍然是健康状态,并且能够正常使用

恢复node2节点实例到集群中

注意:如果单纯的去按照原始配置启动node2上的实例的话,会提示无法加入集群(因为实例id已经注册上去了c22c1c7b5a4c9f9b)

1
2
3
# docker logs etcd-node
 ....
2017-11-04 01:12:25.307452 C | etcdmain: member c22c1c7b5a4c9f9b has already been bootstrapped

正确的恢复姿势 1.在集群中移除异常节点 2.在集群中增加集群节点(异常节点也作为新节点加入集群) 3.根据集群反馈信息进行异常节点重启

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# etcdctl member remove c22c1c7b5a4c9f9b
2017-11-04 09:15:48.488427 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Removed member c22c1c7b5a4c9f9b from cluster

# etcdctl member add etcd-2 http://10.0.0.2:2380
2017-11-04 09:16:50.704247 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Added member named etcd-2 with ID 51e807366fadaded to cluster

ETCD_NAME="etcd-2"
ETCD_INITIAL_CLUSTER="etcd-2=http://10.0.0.2:2380,etcd-1=http://10.0.0.1:2380,etcd-3=http://10.0.0.3:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

根据反馈信息进行异常节点恢复(给出了name,cluster,state三个参数) 由于我们是根据原有集群进行恢复节点,所以需要修改node2节点的状态,并启动实例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# grep STATE  /etc/etcd/etcd.conf
CLUSTER_STATE="existing"
# systemctl restart  etcd-node
# docker ps
CONTAINER ID        IMAGE                               COMMAND                  CREATED             STATUS              PORTS               NAMES
9fb8b1544642        xxbandy123/etcd:3.0.10   "/docker-entrypoint.s"   33 seconds ago      Up 32 seconds                           etcd-node

# etcdctl cluster-health
2017-11-04 09:23:44.053605 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2017-11-04 09:23:44.054497 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
member 51e807366fadaded is healthy: got healthy result from http://10.0.0.2:2379
member 9923f8b86d3ce7a6 is healthy: got healthy result from http://10.0.0.1:2379
member b01d138087dbe547 is healthy: got healthy result from http://10.0.0.3:2379
cluster is healthy
# etcdctl get bgops
2017-11-04 09:23:38.541355 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
biaoge

至此,node2节点成功恢复到etcd集群中,并可以提供正常服务