thirose’s blog

openstackやpythonなどなど

designateでworkerを2以上に増やすとservice_statusが更新されない原因と解決策

English version -> medium.com

designateの各サービスのworker数をデフォルトの1から2以上にした時、service_statusが更新されないのでその原因を調べた。

まず、worker数が1の時の起動の流れを確認 すると、designate/service_status.HeartBeatEmitter.startが実行されているのがわかります。

workerが1の場合

2019-04-04 12:14:52.670 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.__init__: 96400 __init__ /usr/lib/python2.7/site-packages/designate/service_status.py:53
2019-04-04 12:14:52.730 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:14:52.730 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
2019-04-04 12:14:52.731 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 96400 start /usr/lib/python2.7/site-packages/designate/service_status.py:100
2019-04-04 12:14:52.732 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102
2019-04-04 12:14:57.735 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:14:57.735 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
2019-04-04 12:15:02.732 96400 DEBUG designate.service_status [req-2b1269ed-5802-4787-a2f2-8ef583d01bd7 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:15:02.733 96400 DEBUG designate.service_status [req-2b1269ed-5802-4787-a2f2-8ef583d01bd7 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
2019-04-04 12:15:07.731 96400 DEBUG designate.service_status [req-b3c89e7e-a321-4c4c-8a0b-d8bb880962bf - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:15:07.732 96400 DEBUG designate.service_status [req-b3c89e7e-a321-4c4c-8a0b-d8bb880962bf - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
MariaDB [designate]> select * from service_statuses where service_name='api'\G
*************************** 1. row ***************************
            id: 4e24db8d45be496f8119fda6eb706d96
    created_at: 2019-04-04 03:14:57
    updated_at: 2019-04-04 03:37:18
  service_name: api
      hostname: dns001.host
heartbeated_at: 2019-04-04 03:37:18
        status: UP
         stats: {}
  capabilities: {}
1 row in set (0.00 sec)

ではworker数が2以上になった場合はどうなるのか。

わかりやすいように以下のようにデバック用のログが表示されるようにして起動します。

# diff -u /usr/lib/python2.7/site-packages/designate/service_status.py.org /usr/lib/python2.7/site-packages/designate/service_status.py
--- /usr/lib/python2.7/site-packages/designate/service_status.py.org    2019-04-03 17:48:46.892646687 +0900
+++ /usr/lib/python2.7/site-packages/designate/service_status.py        2019-04-03 17:47:35.032081139 +0900
@@ -12,6 +12,7 @@
 # License for the specific language governing permissions and limitations
 # under the License.
 import abc
+import os
 
 from oslo_config import cfg
 from oslo_log import log as logging
@@ -49,6 +50,7 @@
     def __init__(self, service, threadgroup, status_factory=None):
         super(HeartBeatEmitter, self).__init__()
 
+        LOG.debug("designate/service_status.HeartBeatEmitter.__init__: %s", os.getpid())
         self._service = service
         self._hostname = CONF.host
 
@@ -70,6 +72,7 @@
         """
         Returns Status, Stats, Capabilities
         """
+       LOG.debug("designate/service_status.HeartBeatEmitter._emit_heartbeat: %s", os.getpid())
         if not self._running:
             return
 
@@ -93,6 +96,7 @@
         pass
 
     def start(self):
+        LOG.debug("designate/service_status.HeartBeatEmitter.start: %s", os.getpid())
         self._running = True
 
     def stop(self):

そして、起動すると、以下のようなログになります。

2019-04-04 12:37:24.512 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.__init__: 98588 __init__ /usr/lib/python2.7/site-packages/designate/service_status.py:53
2019-04-04 12:37:24.533 98598 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98598 start /usr/lib/python2.7/site-packages/designate/service_status.py:100
2019-04-04 12:37:24.533 98598 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102
2019-04-04 12:37:24.536 98599 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98599 start /usr/lib/python2.7/site-packages/designate/service_status.py:100
2019-04-04 12:37:24.536 98599 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102
2019-04-04 12:37:24.537 98600 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98600 start /usr/lib/python2.7/site-packages/designate/service_status.py:100
2019-04-04 12:37:24.538 98600 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102
2019-04-04 12:37:24.563 98601 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98601 start /usr/lib/python2.7/site-packages/designate/service_status.py:100
2019-04-04 12:37:24.563 98601 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102
2019-04-04 12:37:24.636 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 98588 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:37:24.636 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
2019-04-04 12:37:29.637 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 98588 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:37:29.637 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76

__init___emit_heartbeat は process_idが 98588start98598, 98599, 98600, 98601になってます。
そして、各process idを確認すると、

# ps auxwwf | grep designate-api
designa+  98588  1.9  0.7 347336 62380 ?        Ss   12:37   0:02 /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log
designa+  98598  0.2  0.8 392348 69792 ?        S    12:37   0:00  \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log
designa+  98599  0.2  0.8 392344 69784 ?        S    12:37   0:00  \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log
designa+  98600  0.2  0.8 392344 69784 ?        S    12:37   0:00  \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log
designa+  98601  0.2  0.8 392348 69784 ?        S    12:37   0:00  \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log

となってるので、__init___emit_heartbeatは親プロセスでstartは子プロセスということがわかります。
start メソッドでは _running という値をTrueに変更していて、これは defaultではFalseなので実際に確認を行っている親プロセスでは、False から変わることはありません。

def start(self):
    self._running = True

それが原因で_emit_heartbeatif not self._running の判定は毎回 Falseでreturnされるため、更新がされていません。

    def _emit_heartbeat(self):
        """
        Returns Status, Stats, Capabilities
        """
        if not self._running:
            return

        status, stats, capabilities = self._get_status()

        service_status = objects.ServiceStatus(
            service_name=self._service,
            hostname=self._hostname,
            status=status,
            stats=stats,
            capabilities=capabilities,
            heartbeated_at=timeutils.utcnow()
        )

        LOG.trace("Emitting %s", service_status)

        self._transmit(service_status)

だから親プロセスの _runningフラグを書き換える必要があります。

# diff -u /usr/lib/python2.7/site-packages/designate/cmd/api.py.org /usr/lib/python2.7/site-packages/designate/cmd/api.py
--- /usr/lib/python2.7/site-packages/designate/cmd/api.py.org   2019-04-03 18:10:09.908918965 +0900
+++ /usr/lib/python2.7/site-packages/designate/cmd/api.py       2019-04-03 18:09:49.886038819 +0900
@@ -40,4 +40,5 @@
 
     server = api_service.Service(threads=CONF['service:api'].threads)
     service.serve(server, workers=CONF['service:api'].workers)
+    server.heartbeat_emitter.start()
     service.wait()

こういう設定を他のサービス(central, pool_manager, zone_manager, mdns)にも修正を行うと、
以下のように止まってたservice_statusが更新されるようになります。

  • before
MariaDB [designate]> select * from service_statuses where service_name='api'\G
*************************** 1. row ***************************
            id: 4e24db8d45be496f8119fda6eb706d96
    created_at: 2019-04-04 03:14:57
    updated_at: 2019-04-04 03:37:18
  service_name: api
      hostname: dns001.host
heartbeated_at: 2019-04-04 03:37:18
        status: UP
         stats: {}
  capabilities: {}
1 row in set (0.00 sec)
  • after
MariaDB [designate]> select * from service_statuses where service_name='api'\G
*************************** 1. row ***************************
            id: 4e24db8d45be496f8119fda6eb706d96
    created_at: 2019-04-04 03:14:57
    updated_at: 2019-04-04 03:42:23
  service_name: api
      hostname: dns001.host
heartbeated_at: 2019-04-04 03:42:23
        status: UP
         stats: {}
  capabilities: {}
1 row in set (0.00 sec)

そしてlogも

2019-04-04 12:42:23.711 99088 DEBUG designate.service_status [req-8be27649-7819-4416-b936-9015b9515963 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 99088 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:42:23.712 99088 DEBUG designate.service_status [req-8be27649-7819-4416-b936-9015b9515963 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
2019-04-04 12:42:28.716 99088 DEBUG designate.service_status [req-286a4481-dad8-4fc5-802d-ad1d9e0b8d3b - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 99088 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75
2019-04-04 12:42:28.717 99088 DEBUG designate.service_status [req-286a4481-dad8-4fc5-802d-ad1d9e0b8d3b - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76

と、しっかりTrueになっていることが確認できます。

プルリクエス

このイシューについてはアップストリームに反映してマージされました。

review.opendev.org