designateでdeadlockが発生する原因と解決手段

English follow is here =>

Designate got the DBDeadLock – Hirose Takahito – Medium

designateを使っていてたまにdeadlockがおきていることに気がついたので、その原因を探っていきました。
原因は登録/更新/削除のフローで起きる構造になっていたのでそれについてまとめました。

f:id:hirosetakahito:20190402133046p:plain

これは簡単にまとめたdesignateの登録フローです。今回deadlockのポイントになったのが、２と3の最初のMySQLへの登録(更新)プロセスと、8と9のstatus更新プロセスの部分です。

for i in {1..100} ; do openstack recordset create --records 192.168.0.1 --type A 26b12550-6d64-49eb-a69d-0427472b7da2 z$i ; done

今回はシンプルなシェルスクリプトで確認しました。すると、

------------------------
LATEST DETECTED DEADLOCK
------------------------
190320 15:32:32
*** (1) TRANSACTION:
TRANSACTION 27B30A, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 243, OS thread handle 0x7f2ac428d700, query id 65726 10.127.163.56 designate Updating
UPDATE zones SET version=(zones.version + 1), updated_at=‘2019-03-20 06:32:32.527418’, status=‘ACTIVE’, action=‘NONE’ WHERE zones.id = ‘26b125506d6449eba69d0427472b7da2’ AND zones.deleted = ‘0’
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 653 page no 3 n bits 72 index `PRIMARY` of table `designate`.`zones` trx id 27B30A lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 27B306, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 274, OS thread handle 0x7f2a70d71700, query id 65731 10.127.163.56 designate Updating
UPDATE records SET version=(records.version + 1), updated_at=‘2019-03-20 06:32:32.545283’, data=‘ns1.example.com. domain.example.com. 1553063552 3562 600 86400 3600’, hash=‘a4718b30220d6ff2d4b4cc3602654509’, serial=1553063552 WHERE records.id = ‘78b667000767489a8fc821275d5fff0b’
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 653 page no 3 n bits 72 index `PRIMARY` of table `designate`.`zones` trx id 27B306 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 649 page no 12 n bits 112 index `PRIMARY` of table `designate`.`records` trx id 27B306 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (1)

のようにやはり、タイミングがかさなるとDeadLockが発生していました。

なので、今回は以下のように修正してpatchを投げました。順番を修正してあげれば大丈夫です。

$ git diff
diff --git a/designate/central/service.py b/designate/central/service.py
index ffd12c6..f6f734d 100644
--- a/designate/central/service.py
+++ b/designate/central/service.py
@@ -2317,8 +2317,8 @@ class Service(service.RPCService, service.Service):
“”"
# TODO(kiall): If the status is SUCCESS and the zone is already ACTIVE,
#              we likely don’t need to do anything.
-        self._update_record_status(context, zone_id, status, serial)
zone = self._update_zone_status(context, zone_id, status, serial)
+        self._update_record_status(context, zone_id, status, serial)
return zone
def _update_zone_status(self, context, zone_id, status, serial):
zone = self._update_zone_status(context, zone_id, status, serial)
+        self._update_record_status(context, zone_id, status, serial)
return zone
def _update_zone_status(self, context, zone_id, status, serial):

patchは以下のURLです。 Gerrit Code Review

thirose’s blog

openstackやpythonなどなど

designateでdeadlockが発生する原因と解決手段