Troubleshooting performance issues in Linux

Performance problems are caused by bottlenecks in one or more hardware subsystems, depending on the profile of resource usage on your system. Some elements to consider:

buggy software
disk usage
memory usage
CPU cycles
network bandwidth

The big picture

There is no golden rule for troubleshooting performance issues. There are many different causes of bottlenecks and no ultimate mapping of "classes of systems" and "causes". From our experience, there are common causes of bottlenecks for given systems, but don't let that fool you to think this always applies.

Common performance bottlenecks

Usually, database systems has IO-bound resource usage and requires a lot of RAM, anti-virus software uses many CPU cycles, anti-spam software may stress CPU and network (RBL and other distributed checks). Application servers (Java, PHP, Ruby, Python, etc) may use a lot of processing.

Also, sometimes a given system is just no scalable enough, even if the hardware is good enough. Maybe it forks too many new processes, opens too many file descriptors or is just buggy. We've seen many programs doing long "sleep()" for no clear reason. In this case, resource usage will be minimal, but the system will be sluggish.

Investigating performance issues

To troubleshoot performance issues, your strategy will depend on the nature of the problem. Is it always slow, or it is this problem irregular - it appears as suddenly as it goes away?

Troubleshooting constant slowness

Constant problems are much easier to spot. In this case, it is advisable to have historical statistics for resource usage in your system.

Using the sysstat package to get historical resource usage information

What you should do in any case is to gather resource usage information. When you get the handle of it, most of the time you'll be able to spot the root cause of slowness very easily.

First of all, install sysstat in your server, so you'll get detailed statistics about CPU, memory, disk and other resources usage.

[root@box~]# apt-get install sysstat

For RedHat-based systems:

[root@box ~]# chkconfig on sysstat
sysstat 0:off 1:on 2:on 3:on 4:on 5:on 6:off

For debian-based systems, including Ubuntu, edit the file /etc/default/sysstat and change the ENABLE variable to "true"

Then start the sysstat daemon:

[root@box ~]# /etc/init.d/sysstat start

It starts collecting data now. Later you'll be able for example to run "sar -r" or "sar -b" to get memory or IO (disk) statistics, respectively. "sar -A" shows a full report.

Don't worry if these numbers are meaningless to you right now, but we should be able to use them to analyze performance issues better. If you later open a support ticket regarding performance, please mention that "sysstat" is installed and collecting data. That would help us a lot.

Analyzing disk usage with iostat

Sample usage:


root@box:~# iostat -x 5
Linux 2.6.16.29-xenU-rimu-20061021 (staff.rimuhosting.com)      03/24/07

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.01    0.00    0.00    0.05    0.11   99.83

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda9        0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00    12.80     0.00    5.60   5.60   0.00
xvda1        0.00   0.07  0.01  0.12    0.51    1.57     0.26     0.79    15.77     0.00   11.29   6.41   0.08

This is the output of a system under heavy stress:


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          31,67    0,00   24,30   44,02    0,00    0,00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               1,39  6058,37 14,54 81,67   272,51 50165,74   524,22   131,41 1433,21  10,36  99,68
hdc               0,00     0,00  0,00  0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

Memory usage

The easiest way to analyze the system memory usage is to use ''free -m'':


# free -m
             total       used       free     shared    buffers     cached
Mem:           320        314          6          0          5         93
-/+ buffers/cache:        215        104
Swap:          127        110         17

The most important number here is the free space in the row after "-/+ buffers/cache", which in this case is 104, which is around 30% of physical memory (320). This is a normal (but not excellent) figure, so your system memory usage is healthy. If that number was much lower, than it probably means your system needs memory.

In this server in particular, though, that's the case because Linux has just oom-killed a few processes, so the there's a lot of free RAM, so remember to check dmesg when analyzing memory usage in a system.

tjeuba0

Thứ Tư, 22 tháng 3, 2017