5 Minute Linux Admin/DBA – What OS tools to use to monitor your database server

A very special 5 minute DBA post here, we are crossing over… sys admin & dba oh my! I tend to always look first at the OS, and then move over to looking at what is going on inside the database. So if you have five minutes to look at the OS, what do you look for? What tools do you use? What gotchas are their?

First Everyone should be familiar with top. This is a great tool and place to start.


top – 20:42:56 up 2 days, 6:36, 4 users, load average: 1.02, 1.08, 1.01

Tasks: 201 total, 1 running, 198 sleeping, 1 stopped, 1 zombie

Cpu(s): 8.8%us, 0.6%sy, 0.0%ni, 76.3%id, 14.2%wa, 0.0%hi, 0.1%si, 0.0%st

Mem: 8173772k total, 8123568k used, 50204k free, 130972k buffers

Swap: 6032368k total, 45172k used, 5987196k free, 6533228k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

25025 mysql 20 0 1169m 901m 4876 S 31 11.3 17:02.47 mysqld

25606 root 20 0 12236 1580 908 S 3 0.0 0:40.25 mysql

25872 root 20 0 18992 1332 932 R 1 0.0 0:00.02 top

1 root 20 0 4020 888 600 S 0 0.0 0:01.06 init

2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd

3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0

4 root 15 -5 0 0 0 S 0 0.0 0:00.22 ksoftirqd/0

Top gives you a quick overview of what the system is doing and which processes are actively getting CPU. Lets walk through this quickly:

LOAD AVERAGE The # of runnable processes on the system. The higher the load average the more things are queued up and waiting on resources.
CPU(s): This is the severs CPU #’s, %us is user cpu, %sy is the system (kernel) CPU percentage, %wa is the amount of time that was taken up waiting for IO that could have been used to actually process something. Add %sy and %us to get your true CPU utilization. %wa is a great starting point to ee how busy your IO is.
Mem: This is your current system memory, how much is used, and how much is free.
Swap: Here you can find current swap usage information, swapping is bad avoid this.

The next section is the list of processes, I would read this as follows : “The mysqld command is taking up 31% of 1 CPU and 11% of the total memory on the server. “ … yes CPU can go over 100% for boxes with multiple cores and cpu’s.

Next Up is vmstat.


root@bigdbahead:/home/matt/QuickBenchMark/output# vmstat 5 5
procs———–memory———- —swap– —–io—- -system– —-cpu—-

r b swpd free buff cache si so bi bo in cs us sy id wa

0 1 45172 52604 105776 6564100 0 0 495 364 20 16 1 0 94 5

0 1 45172 51700 105704 6565404 0 0 1810 18834 108 435 10 0 76 14

0 1 45172 51940 105240 6565644 0 0 3254 18436 166 613 14 0 75 10

0 1 45172 53276 104464 6565092 0 0 2475 20602 152 573 14 0 76 10

1 1 45172 50872 104432 6567548 0 0 2552 16244 134 528 11 0 75 13

Vmstat accepts the number of seconds to average data over and the number of iterations you want it to run for. This is useful to general system information. Here you will find CPU information, stats on io, swapping, and memory. I use it but not as much as Sar.

On Linux there is a package called systat has all kinds of goodies. The two most important ones are iostat and sar. Sar can be your best friend. Sar can be scheduled in cron to collect stats on the box. This means you can get historical performance metrics with easy setup & minimal impact on the system. If you have it installed you can type sar. It defaults to showing you todays CPU data:


root@bigdbahead:/home/matt/QuickBenchMark/output# sar
Linux 2.6.24-19-generic (bigdbahead.homelinux.com)

03/04/200912:00:01 AM CPU %user %nice %system %iowait %steal %idle

12:05:01 AM all 1.06 0.00 0.52 23.27 0.00 75.15

12:15:01 AM all 1.04 0.00 0.49 23.25 0.00 75.22

12:25:01 AM all 1.04 0.00 0.49 23.24 0.00 75.23

12:35:01 AM all 1.08 0.00 0.51 23.21 0.00 75.20

12:45:01 AM all 1.09 0.00 0.53 23.10 0.00 75.28

12:55:01 AM all 0.89 0.00 0.47 23.29 0.00 75.35

01:05:01 AM all 0.62 0.00 0.33 23.66 0.00 75.39

01:15:01 AM all 0.61 0.00 0.33 23.68 0.00 75.38

01:25:01 AM all 0.65 0.00 0.32 23.67 0.00 75.36

01:35:01 AM all 1.11 0.00 0.35 23.62 0.00 74.92

01:45:01 AM all 2.87 0.00 0.49 22.57 0.00 74.07

01:55:01 AM all 0.66 0.00 0.38 23.57 0.00 75.39

02:05:01 AM all 0.67 0.00 0.38 23.57 0.00 75.37

02:15:01 AM all 0.68 0.00 0.37 23.57 0.00 75.37

Finally you can see what is going on in the middle of the night! But sar is so much more then CPU. It collects everything. Sar -A will give you more data then you could ever want, on pretty much everything your system is doing. The two most useful? ‘sar -d’ for disk and ‘sar -n DEV’ for network stats. Sar can also be called like vmstat, “sar 10 10”…. remember sar it can save your Life.

Next up is iostat. Iostat is going to give you the nitty gritty details on the disk.


root@bigdbahead:/home/matt/QuickBenchMark/output# iostat 2 2
Linux 2.6.24-19-generic (bigdbahead.homelinux.com) 03/04/2009

avg-cpu: %user %nice %system %iowait %steal %idle

1.34 0.00 0.17 4.91 0.00 93.59

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn

sda 2.07 12.76 904.40 2525634 178966792

sdb 2.24 307.43 401.38 60835955 79427456

sdc 37.08 1854.83 876.42 367044667 173430448

sdd 37.22 1853.05 882.34 366692303 174601592

sde 0.00 0.01 0.00 2955 0

md0 119.24 3707.85 1758.75 733730202 348032040

avg-cpu: %user %nice %system %iowait %steal %idle

8.47 0.00 0.97 14.77 0.00 75.79

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn

sda 63.00 0.00 31048.00 0 62096

sdb 24.00 8476.00 0.00 16952 0

sdc 0.00 0.00 0.00 0 0

sdd 0.00 0.00 0.00 0 0

sde 0.00 0.00 0.00 0 0

md0 0.00 0.00 0.00 0 0

Iostat with the default parms leave a lot to be desired. Big deal, I have 63tps hitting my disk… is that good or is that bad? Who knows right? Thats why I always end up calling iostat -x…


root@bigdbahead:/home/matt/QuickBenchMark/output# iostat -x 2 2
Linux 2.6.24-19-generic (bigdbahead.homelinux.com) 03/04/2009

avg-cpu: %user %nice %system %iowait %steal %idle

1.34 0.00 0.17 4.91 0.00 93.57

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.01 18.42 0.24 1.88 12.90 930.08 444.24 0.02 8.82 5.93 1.26

sdb 1.09 49.04 1.16 1.10 313.71 401.09 316.94 0.20 86.98 4.34 0.98

sdc 5.54 16.90 28.18 8.88 1853.45 875.76 73.65 0.40 10.78 4.37 16.20

sdd 5.56 16.91 28.19 9.01 1851.67 881.68 73.49 0.38 10.11 4.27 15.88

sde 0.01 0.00 0.00 0.00 0.01 0.00 11.45 0.00 0.57 0.40 0.00

md0 0.00 0.00 67.45 51.70 3705.08 1757.44 45.85 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle

7.10 0.00 0.36 16.49 0.00 76.05

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 853.50 16.00 92.00 128.00 32144.00 298.81 0.83 7.67 6.11 66.00

sdb 27.00 0.00 18.00 0.00 7048.00 0.00 391.56 0.09 4.89 4.89 8.80

sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Look at all the glorious stats!!!! oh boy! You can check out the reads per second ( r/s ), the writes per second ( w/s), the disk queue ( avgqu-sz ) and most importantly disk latency ( await + srvtm )… oh fun stuff.

So typically here is what I do when I log on to a server that is having perf issues:

1.Run Top… give it a quick once over… I check for :

    1.find the amount of memory,
    2.the load average
    3.Current CPU
    4.look to see if other processes are stealing CPU time.
    5.Is the system waiting on IO
    6.Are you swapping

2.Run Iostat -x 10

    1.Look for the busy disk, make sure that await+srvtm is < 5ms, <10ms if the disk stinks.
    2.Look at the r/s -vs- w/s to get a general feel for what the box is doing.

3.Uname -a

    1.What? Yep I check to make sure the kernel on the server is up to date and that the client is running a 64 bit OS

4.Run Sar

    1.I Run SAR without any parms to get todays cpu, and look for spikes in sys+usr as well as high periods of IO
    2.I run SAR -n DEV and look to see how saturated the network pipe is
    3.I run sar -f /var/log/sa[day] to get a previous days numbers so I can compare today with that other day.

5.Check the disk space

    1.df -h

That takes about 5 minutes… or less. Now I can start digging into MySQL, knowing I have a CPU problem, a disk problem, or a network problem. I do do other things, but typically they are as a result of something I see in the steps above. Remember just the basics, what you can do in 5 minutes.

This entry was posted in 5 minute dba, linux, mysql. Bookmark the permalink.

3 Responses to 5 Minute Linux Admin/DBA – What OS tools to use to monitor your database server

  1. Gerry says:

    sar can be overwhelming and I find it most useful w/ the “-s” option to specify the starting time (at the end of the day the list can be too long) in 24hs format as HH:MM:SS.
    The 2nd option is that, as with the other sysstat package utilities, it can take an interval and number of samples, so you can fire sar as “sar 10 10″ and get 10 samples 10 seconds apart.

  2. Pingback: Log Buffer #138: A Carnival of the Vanities for DBAs

  3. Xaprb says:

    I really like one of the other tools that’s part of sysstat: mpstat. 5% iowait is not all that helpful to know, but knowing that exactly one of the CPUs is spending 40% of time in iowait and the others are not at all io bound is important.

    mpstat -P ALL 5