designateでworkerを2以上に増やすとservice_statusが更新されない原因と解決策
English version -> medium.com
designateの各サービスのworker数をデフォルトの1から2以上にした時、service_statusが更新されないのでその原因を調べた。
まず、worker数が1の時の起動の流れを確認 すると、designate/service_status.HeartBeatEmitter.start
が実行されているのがわかります。
workerが1の場合
2019-04-04 12:14:52.670 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.__init__: 96400 __init__ /usr/lib/python2.7/site-packages/designate/service_status.py:53 2019-04-04 12:14:52.730 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:14:52.730 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:14:52.731 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 96400 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:14:52.732 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:14:57.735 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:14:57.735 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:15:02.732 96400 DEBUG designate.service_status [req-2b1269ed-5802-4787-a2f2-8ef583d01bd7 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:15:02.733 96400 DEBUG designate.service_status [req-2b1269ed-5802-4787-a2f2-8ef583d01bd7 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:15:07.731 96400 DEBUG designate.service_status [req-b3c89e7e-a321-4c4c-8a0b-d8bb880962bf - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:15:07.732 96400 DEBUG designate.service_status [req-b3c89e7e-a321-4c4c-8a0b-d8bb880962bf - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
MariaDB [designate]> select * from service_statuses where service_name='api'\G *************************** 1. row *************************** id: 4e24db8d45be496f8119fda6eb706d96 created_at: 2019-04-04 03:14:57 updated_at: 2019-04-04 03:37:18 service_name: api hostname: dns001.host heartbeated_at: 2019-04-04 03:37:18 status: UP stats: {} capabilities: {} 1 row in set (0.00 sec)
ではworker数が2以上になった場合はどうなるのか。
わかりやすいように以下のようにデバック用のログが表示されるようにして起動します。
# diff -u /usr/lib/python2.7/site-packages/designate/service_status.py.org /usr/lib/python2.7/site-packages/designate/service_status.py --- /usr/lib/python2.7/site-packages/designate/service_status.py.org 2019-04-03 17:48:46.892646687 +0900 +++ /usr/lib/python2.7/site-packages/designate/service_status.py 2019-04-03 17:47:35.032081139 +0900 @@ -12,6 +12,7 @@ # License for the specific language governing permissions and limitations # under the License. import abc +import os from oslo_config import cfg from oslo_log import log as logging @@ -49,6 +50,7 @@ def __init__(self, service, threadgroup, status_factory=None): super(HeartBeatEmitter, self).__init__() + LOG.debug("designate/service_status.HeartBeatEmitter.__init__: %s", os.getpid()) self._service = service self._hostname = CONF.host @@ -70,6 +72,7 @@ """ Returns Status, Stats, Capabilities """ + LOG.debug("designate/service_status.HeartBeatEmitter._emit_heartbeat: %s", os.getpid()) if not self._running: return @@ -93,6 +96,7 @@ pass def start(self): + LOG.debug("designate/service_status.HeartBeatEmitter.start: %s", os.getpid()) self._running = True def stop(self):
そして、起動すると、以下のようなログになります。
2019-04-04 12:37:24.512 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.__init__: 98588 __init__ /usr/lib/python2.7/site-packages/designate/service_status.py:53 2019-04-04 12:37:24.533 98598 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98598 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.533 98598 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.536 98599 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98599 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.536 98599 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.537 98600 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98600 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.538 98600 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.563 98601 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98601 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.563 98601 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.636 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 98588 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:37:24.636 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:37:29.637 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 98588 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:37:29.637 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
__init__
と _emit_heartbeat
は process_idが 98588
で start
は 98598, 98599, 98600, 98601
になってます。
そして、各process idを確認すると、
# ps auxwwf | grep designate-api designa+ 98588 1.9 0.7 347336 62380 ? Ss 12:37 0:02 /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98598 0.2 0.8 392348 69792 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98599 0.2 0.8 392344 69784 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98600 0.2 0.8 392344 69784 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98601 0.2 0.8 392348 69784 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log
となってるので、__init__
と_emit_heartbeat
は親プロセスでstartは子プロセスということがわかります。
start メソッドでは _running という値をTrue
に変更していて、これは defaultではFalse
なので実際に確認を行っている親プロセスでは、False
から変わることはありません。
def start(self): self._running = True
それが原因で_emit_heartbeat
の if not self._running
の判定は毎回 False
でreturnされるため、更新がされていません。
def _emit_heartbeat(self): """ Returns Status, Stats, Capabilities """ if not self._running: return status, stats, capabilities = self._get_status() service_status = objects.ServiceStatus( service_name=self._service, hostname=self._hostname, status=status, stats=stats, capabilities=capabilities, heartbeated_at=timeutils.utcnow() ) LOG.trace("Emitting %s", service_status) self._transmit(service_status)
だから親プロセスの _runningフラグを書き換える必要があります。
# diff -u /usr/lib/python2.7/site-packages/designate/cmd/api.py.org /usr/lib/python2.7/site-packages/designate/cmd/api.py --- /usr/lib/python2.7/site-packages/designate/cmd/api.py.org 2019-04-03 18:10:09.908918965 +0900 +++ /usr/lib/python2.7/site-packages/designate/cmd/api.py 2019-04-03 18:09:49.886038819 +0900 @@ -40,4 +40,5 @@ server = api_service.Service(threads=CONF['service:api'].threads) service.serve(server, workers=CONF['service:api'].workers) + server.heartbeat_emitter.start() service.wait()
こういう設定を他のサービス(central, pool_manager, zone_manager, mdns)にも修正を行うと、
以下のように止まってたservice_statusが更新されるようになります。
- before
MariaDB [designate]> select * from service_statuses where service_name='api'\G *************************** 1. row *************************** id: 4e24db8d45be496f8119fda6eb706d96 created_at: 2019-04-04 03:14:57 updated_at: 2019-04-04 03:37:18 service_name: api hostname: dns001.host heartbeated_at: 2019-04-04 03:37:18 status: UP stats: {} capabilities: {} 1 row in set (0.00 sec)
- after
MariaDB [designate]> select * from service_statuses where service_name='api'\G *************************** 1. row *************************** id: 4e24db8d45be496f8119fda6eb706d96 created_at: 2019-04-04 03:14:57 updated_at: 2019-04-04 03:42:23 service_name: api hostname: dns001.host heartbeated_at: 2019-04-04 03:42:23 status: UP stats: {} capabilities: {} 1 row in set (0.00 sec)
そしてlogも
2019-04-04 12:42:23.711 99088 DEBUG designate.service_status [req-8be27649-7819-4416-b936-9015b9515963 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 99088 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:42:23.712 99088 DEBUG designate.service_status [req-8be27649-7819-4416-b936-9015b9515963 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:42:28.716 99088 DEBUG designate.service_status [req-286a4481-dad8-4fc5-802d-ad1d9e0b8d3b - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 99088 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:42:28.717 99088 DEBUG designate.service_status [req-286a4481-dad8-4fc5-802d-ad1d9e0b8d3b - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
と、しっかりTrue
になっていることが確認できます。
プルリクエスト
このイシューについてはアップストリームに反映してマージされました。