Troubleshooting performance issues in Linux
Performance problems are caused by bottlenecks in one or more hardware subsystems, depending on the profile of resource usage on your system. Some elements to consider:
- buggy software
- disk usage
- memory usage
- CPU cycles
- network bandwidth
The big picture
There is no golden rule for troubleshooting performance issues. There are many different causes of bottlenecks and no ultimate mapping of "classes of systems" and "causes". From our experience, there are common causes of bottlenecks for given systems, but don't let that fool you to think this always applies.
Common performance bottlenecks
Usually, database systems has IO-bound resource usage and requires a lot of RAM, anti-virus software uses many CPU cycles, anti-spam software may stress CPU and network (RBL and other distributed checks). Application servers (Java, PHP, Ruby, Python, etc) may use a lot of processing.
Also, sometimes a given system is just no scalable enough, even if the hardware is good enough. Maybe it forks too many new processes, opens too many file descriptors or is just buggy. We've seen many programs doing long "sleep()" for no clear reason. In this case, resource usage will be minimal, but the system will be sluggish.
Investigating performance issues
To troubleshoot performance issues, your strategy will depend on the nature of the problem. Is it always slow, or it is this problem irregular - it appears as suddenly as it goes away?
Troubleshooting constant slowness
Constant problems are much easier to spot. In this case, it is advisable to have historical statistics for resource usage in your system.
Using the sysstat package to get historical resource usage information
What you should do in any case is to gather resource usage information. When you get the handle of it, most of the time you'll be able to spot the root cause of slowness very easily.
First of all, install sysstat in your server, so you'll get detailed statistics about CPU, memory, disk and other resources usage.
[root@box~]# apt-get install sysstat
For RedHat-based systems:
[root@box ~]# chkconfig on sysstat
sysstat 0:off 1:on 2:on 3:on 4:on 5:on 6:off
sysstat 0:off 1:on 2:on 3:on 4:on 5:on 6:off
For debian-based systems, including Ubuntu, edit the file /etc/default/sysstat and change the ENABLE variable to "true"
Then start the sysstat daemon:
[root@box ~]# /etc/init.d/sysstat start
It starts collecting data now. Later you'll be able for example to run "sar -r" or "sar -b" to get memory or IO (disk) statistics, respectively. "sar -A" shows a full report.
Don't worry if these numbers are meaningless to you right now, but we should be able to use them to analyze performance issues better. If you later open a support ticket regarding performance, please mention that "sysstat" is installed and collecting data. That would help us a lot.
Analyzing disk usage with iostat
Sample usage:
root@box:~# iostat -x 5
Linux 2.6.16.29-xenU-rimu-20061021 (staff.rimuhosting.com) 03/24/07
avg-cpu: %user %nice %system %iowait %steal %idle
0.01 0.00 0.00 0.05 0.11 99.83
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
xvda9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.80 0.00 5.60 5.60 0.00
xvda1 0.00 0.07 0.01 0.12 0.51 1.57 0.26 0.79 15.77 0.00 11.29 6.41 0.08
This is the output of a system under heavy stress:
avg-cpu: %user %nice %system %iowait %steal %idle
31,67 0,00 24,30 44,02 0,00 0,00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
hda 1,39 6058,37 14,54 81,67 272,51 50165,74 524,22 131,41 1433,21 10,36 99,68
hdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Memory usage
The easiest way to analyze the system memory usage is to use ''free -m'':
# free -m
total used free shared buffers cached
Mem: 320 314 6 0 5 93
-/+ buffers/cache: 215 104
Swap: 127 110 17
The most important number here is the free space in the row after "-/+ buffers/cache", which in this case is 104, which is around 30% of physical memory (320). This is a normal (but not excellent) figure, so your system memory usage is healthy. If that number was much lower, than it probably means your system needs memory.
In this server in particular, though, that's the case because Linux has just oom-killed a few processes, so the there's a lot of free RAM, so remember to check dmesg when analyzing memory usage in a system.
Không có nhận xét nào:
Đăng nhận xét