pt-stalk 是 Percona Toolkit for MySQL 裡面的其中一項工具。它會連到 MySQL Server,監控指定的數值,當超過指定的 Threshold 時,收集當時資料庫執行的資訊(包含正在進行的 transaction、vmstat、lsof,甚至 gdb stack trace),方便分析解決問題。
pt-stalk 需要以 root 權限執行。執行的方法如下:
pt-stalk --daemonize --variable Threads_connected --threshold 400 -- --defaults-file=/etc/mysql/my.cnf
當同時有 400 以上的 Threads 時,便會收集資料,存到 /var/lib/pt-stalk
內。收集的資料範例:
2014_01_13_00_25_09-df 2014_01_13_00_25_09-lock-waits 2014_01_13_00_25_09-netstat 2014_01_13_00_25_09-ps 2014_01_13_00_25_09-disk-space 2014_01_13_00_25_09-log_error 2014_01_13_00_25_09-netstat_s 2014_01_13_00_25_09-slabinfo 2014_01_13_00_25_09-diskstats 2014_01_13_00_25_09-lsof 2014_01_13_00_25_09-opentables1 2014_01_13_00_25_09-sysctl 2014_01_13_00_25_09-hostname 2014_01_13_00_25_09-meminfo 2014_01_13_00_25_09-opentables2 2014_01_13_00_25_09-top 2014_01_13_00_25_09-innodbstatus1 2014_01_13_00_25_09-mpstat 2014_01_13_00_25_09-output 2014_01_13_00_25_09-transactions 2014_01_13_00_25_09-innodbstatus2 2014_01_13_00_25_09-mpstat-overall 2014_01_13_00_25_09-pmap 2014_01_13_00_25_09-trigger 2014_01_13_00_25_09-interrupts 2014_01_13_00_25_09-mutex-status1 2014_01_13_00_25_09-processlist 2014_01_13_00_25_09-variables 2014_01_13_00_25_09-iostat 2014_01_13_00_25_09-mutex-status2 2014_01_13_00_25_09-procstat 2014_01_13_00_25_09-vmstat 2014_01_13_00_25_09-iostat-overall 2014_01_13_00_25_09-mysqladmin 2014_01_13_00_25_09-procvmstat 2014_01_13_00_25_09-vmstat-overall
然後看 InnoDB status,可以看到很多 transaction 正在等 query cache lock:
MySQL thread id 688253701, OS thread handle 0x7f6a591c9700, query id
19004545452 10.1.1.94 pixblog Waiting for query cache lock
SELECT * FROM `blogarticle` WHERE (`blogarticle_blogid` = 3835626) AND
(`blogarticle_date` < 1388605724) AND (`blogarticle_status` IN (2,3,5,7))
ORDER BY `blogarticle_date` desc, `blogarticle_id` desc LIMIT 1
---TRANSACTION 144B7B32FC, not started
MySQL thread id 688253702, OS thread handle 0x7f6a787bc700, query id
19004544621 10.1.1.191 pixblog Waiting for query cache lock
SELECT * FROM `blog` WHERE `blog_id` = 3126963
---TRANSACTION 144B7B32FD, not started
MySQL thread id 688253706, OS thread handle 0x7f6a887e2700, query id
19004544620 10.1.1.166 pixblog Waiting for query cache lock
SELECT * FROM `bloglayout` WHERE `bloglayout_id` = 3063007
---TRANSACTION 144B7B33C7, not started starting index read
mysql tables in use 1, locked 0
發生問題的當時確實有一個跑統計報表的 Slow Query 正在執行,於是要求該 Query 不使用 Query Cache1 來解決問題。