Proper tool for the job
My grandfather used to say to me: “Use the proper tool for the job”. This is important to keep in mind when faced with performance issues. When I am faced with performance problems in Oracle, I typically start at a high level with AWR reports or Enterprise Manager to get a high level understanding of the workload. To drill down further, the next step is to use Oracle “10046 event” tracing. Cary Millsap created a methodology around event tracing called “Method-R” which shows how to focus in on the source of a performance problem by analyzing the components that contribute to response time. These are all fine places to start to analyze performance problems from the “user” or “application” point of view. But what happens if the OS is in peril?
If you are experiencing high system time or strange application behavior, it is likely time to drill deeper with OS based tools. I mentioned in my last post , “prstat” is the “top” equivalent for Solaris. “prstat” is the best place to start to see how processes are running on Solaris, but at some point you may need to drill down deeper to gain a better understanding of the problem.
With Linux “oprofile” allows you to sample the kernel and user code to build a profile of how the system and applications are behaving. This is an incredibly useful tool, but it doesn’t exist on Solaris. Luckily, there is something that is arguably better – Dtrace.
Solaris Dtrace(1m) / Linux “oprofile”
Dtrace was developed for the release of Solaris 10 by kernel engineers as a way to better debug and monitor Solaris. Unlike “oprofile”, Dtrace is an really an environment that involves writing code in “D” to make use of the numerous amounts of probe data that exist. Dtrace is really powerful, but it does require some heavy lifting to get started. This is where the “Dtrace Toolkit” comes in handy.
The “Dtrace Toolkit” is a set of scripts that server as a starting point for those interested in using Dtrace. Also included in the “Dtrace Toolkit” are some real clever utilities. My two favorite utilities for Dtrace are the “hotkernel” and “hotuser” scripts. These scripts analyze either the kernel or a user “PID” to show which routines are most used. This can be extremely useful when diagnosing performance problems that extend beyond the V$ tables or Oracle “10046 event trace” data.
To illustrate the use of these utilities, I have included output from a benchmark that shows how these might be used.
HOTKERNEL root@apl5-1> ./hotkernel Sampling... Hit Ctrl-C to end. ^C FUNCTION COUNT PCNT nxge`nxge_check_10g_link 1 0.0% genunix`ioctl 1 0.0% ... ... genunix`fop_read 5730 2.1% genunix`kstrgetmsg 6091 2.2% unix`utl0 7058 2.6% FJSV,SPARC64-VII`cpu_halt_cpu 7220 2.6% FJSV,SPARC64-VII`copyout 9340 3.4% ip`tcp_fuse_output 12637 4.6% unix`_resume_from_idle 12922 4.7% unix`disp_getwork 18864 6.8% unix`mutex_enter 34033 12.3% HOTUSER root@apl5-1> ./hotuser -p 12626 Sampling... Hit Ctrl-C to end. ^C FUNCTION COUNT PCNT oracle`kxsInitExecutionHeap 1 0.0% oracle`0x10b319ad0 1 0.0% oracle`kews_pls_jvm_event_resume_i 1 0.0% oracle`0x10b319ac8 1 0.0% oracle`kghfrh 1 0.0% oracle`opiptc 1 0.0% ... ... oracle`qertbFetchByRowID 91 1.0% oracle`kghalf 94 1.1% libc_psr.so.1`memcpy 102 1.2% oracle`opikndf2 105 1.2% oracle`kpofcr 113 1.3% oracle`opiodr 120 1.4% oracle`kslwtectx 120 1.4% oracle`kslwt_update_stats_int 126 1.4% oracle`opitsk 126 1.4% oracle`ksupucg 151 1.7% oracle`nsbasic_brc 153 1.7% oracle`kdxbrs1 187 2.1% oracle`kdxlrs2 192 2.2% oracle`kews_sqlcol_end 194 2.2% oracle`opifch2 212 2.4% oracle`opiexe 250 2.8% oracle`skgslcas 265 3.0% libc_psr.so.1`memset 416 4.7% oracle`kcbgtcr 826 9.4%
You can begin to see how Dtrace can be useful to see the effect of the workload on Solaris and profile the user application – in this case an Oracle shadow process. But this is just the beginning. If you are so inclined, Dtrace can be used to correlate all sorts of performance data both inside the kernel and application.
It’s wonderful that you are getting ideas from this article as well as from our argument made here.