前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >LBaaS 高可用分析与实现

LBaaS 高可用分析与实现

原创
作者头像
腾讯云TStack
修改2017-09-26 09:40:13
2.5K0
修改2017-09-26 09:40:13
举报

作者:潘晓东

LBaaS是OpenStack的负载均衡服务,默认采用的是Haproxy作为Driver实现其负载均衡功能,默认情况下,LBaaS不提供高可能功能,也就是说,一个LBaaS实例故障,将可能会影响业务的负载均衡功能。本文讨论如何实现LBaaS的高可用实现方式。

原理

要实现LBaaS高可用,必须了解LBaaS的实现方式,最简单直接的方法,是从代码进行分析,LBaaS的代码在/usr/lib/python2.7/site-packages/neutron_lbaas下。对于每个LBaaS agent,我们可用从neutron agent-list中查看到,但是对于多个LBaaS Agent,是如何分配LBaaS Service实例的呢,是轮询,还是每个LBaaS Agent都接管,还是其他什么方式呢,只有了解到了LBaaS接管服务的方式,观察代码services/loadbalancer/agent_scheduler.py

代码语言:javascript
复制
1.
2. class ChanceScheduler(object):
3. """Allocatea loadbalancer agent for a vip in a random way."""
4.
5. defschedule(self, plugin, context, pool, device_driver):
6. """Schedule the pool to an activeloadbalancer agent if there
7. is noenabled agent hosting it.
8. """
9. withcontext.session.begin(subtransactions=True):
10. lbaas_agent= plugin.get_lbaas_agent_hosting_pool(
11. context, pool['id'])
12. if lbaas_agent:
13. LOG.debug('Pool %(pool_id)s has already been hosted'
14. ' bylbaas agent %(agent_id)s',
15. {'pool_id': pool['id'],
16. 'agent_id': lbaas_agent['id']})
17. return
18.
19. active_agents= plugin.get_lbaas_agents(context, active=Tru
e) //获取存活的Agent
20. if not active_agents:
21. LOG.warn(_LW('No active lbaas agents for pool %s'), poo
l['id'])
22. return
23.
24. candidates= plugin.get_lbaas_agent_candidates(device_drive
r,
25. active_agents)
//从存活的Agent选出符合条件的Agent
26. if not candidates:
27. LOG.warn(_LW('No lbaas agent supporting device driver %
s'),
28. device_driver)
29. return
30.
31. chosen_agent= random.choice(candidates) //随机选择一个Agent
32. binding = PoolLoadbalancerAgentBinding() //绑定这个Agent
33. binding.agent = chosen_agent
34. binding.pool_id = pool['id']
35. context.session.add(binding)

从代码上看,对于一个LBaaS Pool的调度,首先是从存活的Agent中随机选择一个,然后进行绑定,当绑定以后,就不再变化。观察数据表,发现poolloadbalanceragentbindings正是记录绑定关系的。

代码语言:javascript
复制
1. MariaDB[neutron]> select * from poolloadbalanceragentbindings;
2. +--------------------------------------+---------------------------------
-----+
3. | pool_id | agent_id
|
4. +--------------------------------------+---------------------------------
-----+
5. | 06db8082-2c49-49d2-a0dd-f857bc3db380 |
421d7ae3-24f9-4ef4-be7e-d7a5555686e6 |
6. +--------------------------------------+---------------------------------
-----+

除此以外,就没有表记录相关的信息了。从上面的分析可以得出,要生成一个Pool,必须从存活的LBaaS Agent中选取一个,然后进行绑定,因此,LBaaS实际上是分布式的,具有可扩展的能力,但是LBaaS没有高可用的能力,要实现高可用,必须进行Agent重新Bind,也就是修改poolloadbalanceragentbindings的绑定关系。

网络

修改绑定关系,是否能够实现LBaaS高可用呢?下面我们从网络层面分析一下, LBaaS要建立一个Pool,首先由建立一个VIP,然后绑定几个对应的后端。我们看看网络上发生了什么。

代码语言:javascript
复制
1. [root@con01 neutron_lbaas(keystone_admin)]$ neutron lb-pool-list
2. +--------------------------------------+------+----------+-------------+-
---------+----------------+--------+
3. | id | name | provider | lb_method |
protocol | admin_state_up | status |
4. +--------------------------------------+------+----------+-------------+-
---------+----------------+--------+
5. | 06db8082-2c49-49d2-a0dd-f857bc3db380 | pl04 | haproxy | ROUND_ROBIN
| TCP | True | ACTIVE |
6. +--------------------------------------+------+----------+-------------+-
---------+----------------+--------+
7. [root@con01 neutron_lbaas(keystone_admin)]$ neutron lb-agent-hosting-p
ool 06db8082-2c49-49d2-a0dd-f857bc3db380
8. +--------------------------------------+-------+----------------+-------+
9. | id | host | admin_state_up | alive
|
10. +--------------------------------------+-------+----------------+-------+
11. | 421d7ae3-24f9-4ef4-be7e-d7a5555686e6 | con02 | True | :-)
|
12. +--------------------------------------+-------+----------------+-------+

从上面观察,pool是绑定到了con02上,如下:

代码语言:javascript
复制
1. [root@con02 ~]# ip netns list
2. qdhcp-d267e13d-703e-43a5-863c-e6878390562d
3. qdhcp-bf124fc6-92f0-4453-bdcc-4c6c39de67a4
4. qrouter-8bbd8b10-2284-4d9c-8915-6d2c96d9a81b
5. qlbaas-06db8082-2c49-49d2-a0dd-f857bc3db380
6.
7. [root@con02 ~]# ip netns exec
qlbaas-06db8082-2c49-49d2-a0dd-f857bc3db380 ip a
8. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
9. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
10. inet 127.0.0.1/8 scope host lo
11. valid_lftforever preferred_lft forever
12. inet6 ::1/128 scope host
13. valid_lftforever preferred_lft forever
14. 14: tap97031725-2b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc n
oqueue state UNKNOWN
15. link/ether fa:16:3e:d9:e2:3a brd ff:ff:ff:ff:ff:ff
16. inet 192.168.1.6/24 brd 192.168.1.255 scope global tap97031725-2b
17. valid_lftforever preferred_lft forever
18. inet6fe80::f816:3eff:fed9:e23a/64 scope link
19. valid_lft foreverpreferred_lft forever

从上面可用发现,con02上有qlbaas-{id}的网络名字空间,而空间内有tap97031725-2b的网卡,正好配置了192.168.1.6(vip)的IP。此网卡正是VIP的网卡。

代码语言:javascript
复制
1. [root@con02 ~]# ovs-vsctl show
2. 167f700f-d100-4543-bd5c-2bd1912f1fa1
3. Bridge "br-eth1"
4. Port "br-eth1"
5. Interface"br-eth1"
6. type: internal
7. Port "eth1"
8. Interface"eth1"
9. Port "phy-br-eth1"
10. Interface"phy-br-eth1"
11. type: patch
12. options: {peer="int-br-eth1"}
13. Bridge br-int
14. fail_mode: secure
15. Port "qr-2555d9a4-b1"
16. tag: 2
17. Interface"qr-2555d9a4-b1"
18. type: internal
19. Port "tap5dc6c7f3-8a"
20. tag: 2
21. Interface"tap5dc6c7f3-8a"
22. type: internal
23. Port br-int
24. Interfacebr-int
25. type: internal
26. Port "int-br-eth1"
27. Interface"int-br-eth1"
28. type: patch
29. options: {peer="phy-br-eth1"}
30. Port "tap9f060cb7-b1"
31. tag: 3
32. Interface"tap9f060cb7-b1"
33. type: internal
34. Port "tap97031725-2b" // vip网卡
35. tag: 2
36. Interface"tap97031725-2b"
37. type: internal
38. Port "ha-634bd8e2-ed"
39. tag: 1
40. Interface"ha-634bd8e2-ed"
41. type: interna

通过网桥观察,发现tap97031725-2b是桥接在br-int之上,和router在同一网络平面,到达tap97031725-2b的包,原则上可用转发到任一内网。如果要做到LBaaS Agent的切换,需要tap97031725-2b在目的机器新建,并且桥接在br-int上,同时需要配置好IP。要做的上述事情,可用采用keepalived来做,但是通过观察LBaaS的代码,发现有更简单的方式,因为在LBaaS重启的时候,会重新Load所有的Pool,如下: vim services/loadbalancer/agent/agent_manager.py

代码语言:javascript
复制
1. def _reload_pool(self, pool_id):
2. try:
3. logical_config= self.plugin_rpc.get_logical_device(pool_id
)
4. driver_name= logical_config['driver']
5. LOG.info("xx: driver_name %s" % driver_name)
6. if driver_name not in self.device_drivers:
7. LOG.error(_LE('No device driver on agent: %s.'), driver
_name)
8. self.plugin_rpc.update_status(
9. 'pool', pool_id, constants.ERROR)
10. return
11.
12. self.device_drivers[driver_name].deploy_instance(logical_co
nfig) //对所有pool进行deploy
13. self.instance_mapping[pool_id] = driver_name
14. self.plugin_rpc.pool_deployed(pool_id)
15. except Exception:
16. LOG.exception(_LE('Unable to deploy instance for pool: %s')
,
17. pool_id)
18. self.needs_resync = True

vim drivers/haproxy/namespace_driver.py

代码语言:javascript
复制
1. @n_utils.synchronized('haproxy-driver')
2. def deploy_instance(self, loadbalancer):
3. """Deploysloadbalancer if necessary
4.
5. :return:True if loadbalancer was deployed, False otherwise
6. """
7. LOG.info("xx: deploy_instance,%s" % str(loadbalancer))
8. if notself.deployable(loadbalancer):
9. LOG.info(_LI("Loadbalancer %s is notdeployable.") %
10. loadbalancer.id)
11. returnFalse
12.
13. if self.exists(loadbalancer.id):
14. self.update(loadbalancer) // 如果存在,则进行更新
15. else:
16. self.create(loadbalancer) // 如果不存在,进行创建
17. return True

从Haproxy的实现中可用看出,LBaaS对于所有的Pool都有同步机制,如果存在,则进行更新,如果不存在,则进行创建,因此,上面所的事情完全可用使用LBaaS的机制来搞定,无需采用keepalived,实现起来简单。

实现

首先采用keeplived来监控LBaaS的服务情况,鉴于基础云已经有了keepalived服务,因此可用复用。keepalived的配置如下:

代码语言:javascript
复制
1. [root@con01 neutron_lbaas(keystone_admin)]$ cat
/etc/keepalived/keepalived.conf
2. vrrp_scriptchk_haproxy {
3. script "/usr/bin/killall -0haproxy"
4. interval 1
5. }
6. vrrp_instanceVI_PUBLIC {
7. interfacebr-ex
8. state BACKUP
9. virtual_router_id66
10. priority 103
11. virtual_ipaddress{
12. 172.16.154.50 dev br-ex
13. }
14. track_script{
15. chk_haproxy
16. }
17.
18. notify_master"/etc/keepalived/lbaas_state_manager.pyMASTER"
//LBaaS的的切换脚本
19. notify_backup"/etc/keepalived/lbaas_state_manager.pyBACKUP"
20. notify_fault"/etc/keepalived/lbaas_state_manager.pyFAULT"
21. }
22. vrrp_sync_groupVG1 {
23. group {
24. VI_PUBLIC
25. }
代码语言:javascript
复制
1. [root@con01 neutron_lbaas(keystone_admin)]$ cat
/etc/keepalived/lbaas_state_manager.py
2. #!/usr/bin/envpython
3. #coding:utf-8
4. import MySQLdb
5. import socket
6. import time
7. import os
8. import logging
9. import sys
10.
11. DB_HOST = "172.16.154.50"
12. DB_PORT = 3306
13. DB_NAME = "neutron"
14. DB_USER = "neutron"
15. DB_PASS = "xxx"
16.
17. UNBIND_TIMEOUT= 3
18. UNBIND_NUM= 3
19. LOG_FILE = "/var/log/neutron/lbaas—state.log"
20. STATE_FILE= "/etc/keepalived/lbaas_state"
21.
22.
23. fmt = "%(asctime)s - %(name)s -%(levelname)s - %(message)s"
24. logging.basicConfig(format=fmt, filename=LOG_FILE, level=logging.INFO)
25.
26. # 数据库管理,省略
27. class DBManger:
28. def __init__(self, host, port, db, user, password, debug=False):
29. ...
30.
31. def get_hostname():
32. return socket.gethostname()
33.
34. # getlbaas agent id
35. def get_lbid(db, hostname):
36. sql = "select id from agents where host='%s'and topic='nlbaas_
agent'" % hostname
37. data = db.exec_sql_ret_one(sql)
38. if data:
39. return data['id']
40. else:
41. returnNone
42.
43. #change to master
44. def change_master_state(db, hostname):
45. sql = "update agents set admin_state_up=1where host ='%s' and top
ic='n-lbaas_agent'" % hostname
46. db.exec_sql(sql)
47.
48. #change to backup
49. def change_backup_state(db, hostname):
50. sql = "update agents set admin_state_up=0where host ='%s' and top
ic='n-lbaas_agent'" % hostname
51. db.exec_sql(sql)
52.
53. #change to master
54. def bind_to_master(db, lbid):
55. sql = "update poolloadbalanceragentbindingsset agent_id='%s'" % l
bid
56. db.exec_sql(sql)
57.
58. def is_bind(db, lbid):
59. sql = "select * from poolloadbalanceragentbindingswhere
agent_id='%s'" % lbid
60. if db.exec_sql_ret_one(sql):
61. returnTrue
62. else:
63. returnFalse
64.
65. def write_state(state):
66. with open(STATE_FILE, "w") as fd:
67. fd.write(state)
68.
69. def service_reload():
70. cmd = "service neutron-lbaas-agentrestart"
71. os.system(cmd)
72.
73. def main(state):
74. db = DBManger(DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASS, True)
75. hostname = get_hostname()
76. lbid = get_lbid(db, hostname)
77. print lbid
78. if state == "MASTER":
79. change_master_state(db, hostname)
80. bind_to_master(db, lbid)
81. service_reload()
82. logging.info("change state to master succ")
83. write_state("MASTER")
84. else:
85. change_backup_state(db, hostname)
86. i = 0
87. while is_bind(db, lbid):
88. if i < UNBIND_NUM:
89. time.sleep(UNBIND_TIMEOUT)
90. else:
91. logging.error("change to backup error, unbindfailed")
92. break
93. service_reload()
94. if not is_bind(db, lbid):
95. logging.info("change to backup succ")
96. write_state("BACKUP")
97. else:
98. write_state("UNBIND ERROR")
99.
100.
101. if __name__ == "__main__":
102. if len(sys.argv) != 2:
103. print "usage:lbaas_state_manager.py master|backup"
104. sys.exit(1)
105. else:
106. state = sys.argv[1]
107. main(state)

LBaaS的切换程序,当keepalived检测到失败的时候,就会进行切换,会重新选择Master,在Master上,LBaaS会对所有的Pool进行切换,将所有的Pool全部Bind到Master,同时在Backup机器上,会将admin_state_up状态改为false,这样做的目的是为了在LBaaS调度的时候,不在随机选择,而是选择Master作为LBaaS的默认调度机器。修改完成以后,对LBaaS进行reload,保证服务正常运行。通过实验观察,切换能够正确完成。

总结

LBaaS高可用是在OpenStack中没有支持的部分,网络的资料也没有提供一种完整的方法,本文通过分析LBaaS的原理,通过keepalived+lbaas切换脚本,就能够实现LBaaS的高可用,无需修改LBaaS的代码,也无需共享存储。实验表明,LBaaS通过此方法切换,在秒级别就能够完成切换,后续可以多进行测试,以验证其稳定性。

原文来自:TStack 公众号

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 原理
  • 网络
  • 实现
  • 总结
相关产品与服务
负载均衡
负载均衡(Cloud Load Balancer,CLB)提供安全快捷的流量分发服务,访问流量经由 CLB 可以自动分配到云中的多台后端服务器上,扩展系统的服务能力并消除单点故障。负载均衡支持亿级连接和千万级并发,可轻松应对大流量访问,满足业务需求。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档