Posts Tagged 'Solaris'

Tuning is in the eye of the beholder… Memory is memory right?

It is human nature to draw from experiences to make sense of our surroundings.  This holds true in life and performance tuning.   A veteran systems administrator will typically tune a system different from an Oracle DBA.  This is fine, but often what is obvious to one, is not to the other.  It is sometimes necessary to take a step back to tune from another perspective.

I recently have ran across a few cases where a customer was tuning “Sorts” in the database by adding memory. Regardless of your prospective, every one knows memory is faster than disk; and the goal of any good tuner is to use as much in memory as possible.   So, when it was noticed by the systems administrator that the “TEMP” disks for Oracle were doing a tremendous amount of IO,  the answer was obvious right?

RamDisk to the rescue

To solve this problem, the savvy systems administrator added a RAM disk to the database.  Since, it was only for “TEMP” space this is seemed reasonable.

ramdiskadm -a oratmp1 1024m
/dev/ramdisk/oratmp1

Indeed user performance was improved.  There are some minor issues around recovery upon system reboot or failure that are annoying, but easily addressed with startup scripts.  So, SLA’s were met and everyone was happy.  And so things were fine for a few years.

Double the HW means double the performance… right?

Fast forward a few years in the future.  The system was upgraded to keep up with demand by doubling the amount of memory and CPU resources.  Everything should be faster right? Well not so fast.  This action increased the NUMA ratio of the machine.  After doubling memory and CPU the average user response time doubled from ~1 second to 2 seconds.  Needless to say, this was not going to fly.   Escalations were mounted and the pressure to resolve this problem reached a boiling point. The Solaris support team was contacted by the systems administrator.  Some of the best kernel engineers in the business began to dig into the problem.  Searching for ways to make the “ramdisk” respond faster in the face of an increased NUMA ratio.

A fresh set of eyes

Since I have worked with the Solaris support engineers on anything Oracle performance related for many years, they asked me to take a look.  I took a peak at the system and noticed the ramdisk in use for TEMP.  To me this seemed odd, but I continued to look at SQL performance.   Things became clear once I saw the “sort_area_size” was default.

It turns out, Oracle was attempting to do in-memory sorts, but with the default settings all users were spilling out to temp.  With 100’s of users on the system, this became a problem real fast.  I had the customer increase the sort_area_size until the sorts occurred in memory with out the extra added over head of spilling out to disk (albit fast disk).  With this slight adjustment, the average user response time was better than it had ever been.

lessons learned

  • Memory is memory, but how you use it makes all the difference.
  • It never hurts to broaden your perspective and get a second opinion
Advertisements

Solaris Eye for the Linux Guy… Part II (oprofile, Dtrace, and Oracle Event Trace)

Proper tool for the job

My grandfather used to say to me: “Use the proper tool for the job”.  This is important to keep in mind when faced with performance issues.  When I am faced with performance problems in Oracle, I typically start at a high level with AWR reports or Enterprise Manager to get a high level understanding of the workload.   To drill down further, the next step is to use Oracle “10046 event” tracing.  Cary Millsap created a methodology around event tracing called “Method-R” which shows how to focus in on the source of a performance problem by analyzing the components that contribute to response time.   These are all fine places to start to analyze performance problems from the “user” or “application” point of view.  But what happens if the OS is in peril?

If you are experiencing high system time or strange application behavior, it is likely time to drill deeper with OS based tools.  I mentioned in my last post , “prstat” is the “top” equivalent for Solaris.  “prstat” is the best place to start to see how processes are running on Solaris, but at some point you may need to drill down deeper to gain a better understanding of the problem.

With Linux “oprofile” allows you to sample the kernel and user code to build a profile of how the system and applications are behaving.  This is an incredibly useful tool, but it doesn’t exist on Solaris.  Luckily, there is something that is arguably better – Dtrace.

Solaris Dtrace(1m) / Linux “oprofile”

Dtrace was developed for the release of Solaris 10 by kernel engineers as a way to better debug and monitor Solaris.  Unlike “oprofile”, Dtrace is an really an environment that involves writing code in “D” to make use of the numerous amounts of probe data that exist.  Dtrace is really powerful, but it does require some heavy lifting to get started.  This is where the “Dtrace Toolkit” comes in handy.

The “Dtrace Toolkit” is a set of scripts that server as a starting point for those interested in using Dtrace.  Also included in the “Dtrace Toolkit” are some real clever utilities.  My two favorite utilities for Dtrace are the “hotkernel” and “hotuser” scripts.  These scripts analyze either the kernel or a user “PID” to show which routines are most used.  This can be extremely useful when diagnosing performance problems that extend beyond the V$ tables or Oracle “10046 event trace” data.

To illustrate the use of these utilities, I have included output from a benchmark that shows how these might be used.

HOTKERNEL

root@apl5-1> ./hotkernel
Sampling... Hit Ctrl-C to end.
^C
FUNCTION                                                COUNT   PCNT
nxge`nxge_check_10g_link                                    1   0.0%
genunix`ioctl                                               1   0.0%
...
...
genunix`fop_read                                         5730   2.1%
genunix`kstrgetmsg                                       6091   2.2%
unix`utl0                                                7058   2.6%
FJSV,SPARC64-VII`cpu_halt_cpu                            7220   2.6%
FJSV,SPARC64-VII`copyout                                 9340   3.4%
ip`tcp_fuse_output                                      12637   4.6%
unix`_resume_from_idle                                  12922   4.7%
unix`disp_getwork                                       18864   6.8%
unix`mutex_enter                                        34033  12.3%

HOTUSER

root@apl5-1> ./hotuser -p 12626
Sampling... Hit Ctrl-C to end.
^C
FUNCTION                                                COUNT   PCNT
oracle`kxsInitExecutionHeap                                 1   0.0%
oracle`0x10b319ad0                                          1   0.0%
oracle`kews_pls_jvm_event_resume_i                          1   0.0%
oracle`0x10b319ac8                                          1   0.0%
oracle`kghfrh                                               1   0.0%
oracle`opiptc                                               1   0.0%
...
...
oracle`qertbFetchByRowID                                   91   1.0%
oracle`kghalf                                              94   1.1%
libc_psr.so.1`memcpy                                      102   1.2%
oracle`opikndf2                                           105   1.2%
oracle`kpofcr                                             113   1.3%
oracle`opiodr                                             120   1.4%
oracle`kslwtectx                                          120   1.4%
oracle`kslwt_update_stats_int                             126   1.4%
oracle`opitsk                                             126   1.4%
oracle`ksupucg                                            151   1.7%
oracle`nsbasic_brc                                        153   1.7%
oracle`kdxbrs1                                            187   2.1%
oracle`kdxlrs2                                            192   2.2%
oracle`kews_sqlcol_end                                    194   2.2%
oracle`opifch2                                            212   2.4%
oracle`opiexe                                             250   2.8%
oracle`skgslcas                                           265   3.0%
libc_psr.so.1`memset                                      416   4.7%
oracle`kcbgtcr                                            826   9.4%

You can begin to see how Dtrace can be useful to see the effect of the workload on Solaris and profile the user application – in this case an Oracle shadow process.  But this is just the beginning.  If you are so inclined, Dtrace can be used to correlate all sorts of performance data both inside the kernel and application.

Solaris Eye for the Linux guy… or how I learned to stop worrying about Linux and Love Solaris (Part 1)

This entry goes out to my Oracle techie friends that have been in the Linux camp for sometime now and are suddenly finding themselves needing to know more about Solaris… hmmmm… I wonder if this has anything to do with Solaris now being an available option with Exadata?  Or maybe the recent announcement that the SPARC T3 multiplier for T3-x servers is now 0.25.  Judging by my inbox recently, I suspect a renewed interest in Solaris to continue.

I have focused on Oracle database performance on Solaris for 14 years now.  In the last few years, I began to work on Exadata and found myself  needing to learn the “Linux” way of performance analysis.   I will cover some basic tools needed for Oracle performance analysis with Solaris as well as some special performance topics.  I am not a deep dive kernel type of guy, so don’t expect esoteric Dtrace scripts to impress your friends.  I am not going to cover how to patch and maintain Solaris – this is out of scope. With this in mind, lets get started.

prstat(1M) … top on steriods!

Probably the first tool that the typical Linux performance guy reaches for is top.  This is part of every Linux distribution that I know of but is sadly missing from Solaris… But Solaris has something much better “prstat(1m)“.  I know the name is boring but it simply means “process status”.  This is the first place to get an idea of how processes are performing on a system and quite likely the most useful tool in general.

# prstat
PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP      
 5623 oracle     32G   32G cpu63    0    0   0:02:23 1.0% oracle/1
 5625 oracle     32G   32G cpu60    0    0   0:02:22 1.0% oracle/1
 5627 oracle     32G   32G sleep    0    0   0:02:18 1.0% oracle/1
 5629 oracle     32G   32G cpu38    0    0   0:02:16 1.0% oracle/1
 5609 oracle     43M   38M sleep    0    4   0:01:21 0.6% rwdoit/1
 5605 oracle     43M   38M sleep    0    4   0:01:18 0.6% rwdoit/1
 5607 oracle     43M   38M sleep    0    4   0:01:18 0.6% rwdoit/1
 5601 oracle     43M   38M sleep    0    4   0:01:17 0.6% rwdoit/1
...
...
Total: 106 processes, 447 lwps, load averages: 5.03, 7.11, 19.48

Top would show you....

# top
load averages:  5.06,  6.84, 18.81                                                                             14:50:13
109 processes: 100 sleeping, 1 stopped, 8 on cpu
CPU states: 92.7% idle,  4.0% user,  3.3% kernel,  0.0% iowait,  0.0% swap
Memory: 128G real, 60G free, 50G swap in use, 135G swap free

 PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME CPU-a- COMMAND
 5623 oracle     1   0    0    0K    0K cpu57   3:08 66.41% oracle
 5625 oracle     1   0    0    0K    0K cpu33   3:06 66.02% oracle
 5627 oracle     1   0    0    0K    0K cpu14   3:02 64.45% oracle
 5629 oracle     1   0    0    0K    0K cpu41   2:59 63.28% oracle
 5609 oracle     1   0    4    0K    0K cpu31   1:46 37.70% rwdoit
 5605 oracle     1   0    4    0K    0K cpu55   1:43 36.33% rwdoit
 5607 oracle     1   0    4    0K    0K sleep   1:43 36.33% rwdoit
 5601 oracle     1   0    4    0K    0K cpu32   1:42 35.94% rwdoit

What is happening?  “top” shows details from a from process point of view where as by default prstat(1M) shows the system aggregates.  To make prstat(1M) look more like top, you have to enable micro-state accounting with the “-m” option.  With the “-m” option, prstat(1M) shows CPU utilization of various processes like top but with a LOT more detail.  You have access to details regarding CPU time broken out by user and system.  You can find out the percentage of time spent in traps, sleep, and time spent waiting for CPU “LAT”.  Finally, you can see the number of voluntary and involuntary context switches along with the number of threads per process.

# prstat -m
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 
 5623 oracle    48  18 0.0 0.0 0.0 0.0  29 4.3 38K   8 .3M   0 oracle/1
 5625 oracle    48  18 0.0 0.0 0.0 0.0  30 4.3 38K   9 .3M   0 oracle/1
 5627 oracle    43  21 0.0 0.0 0.0 0.0  30 5.5 32K   7 .3M   0 oracle/1
 5629 oracle    42  21 0.0 0.0 0.0 0.0  31 5.6 33K   6 .3M   0 oracle/1
 5609 oracle    17  21 0.0 0.0 0.0 0.0  57 5.7 33K   5 77K   0 rwdoit/1
 5605 oracle    20  17 0.0 0.0 0.0 0.0  59 4.5 38K   5 89K   0 rwdoit/1
 5607 oracle    17  19 0.0 0.0 0.0 0.0  58 5.4 32K   4 75K   0 rwdoit/1
 5601 oracle    20  16 0.0 0.0 0.0 0.0  59 4.5 38K   5 90K   0 rwdoit/1

There are a lot of options available to prstat(1M) so do take a look at the prstat(1M) man page. Also look at the “scalingbits” blog for an excellent discussion on prstat(1M) monitoring.  Stephan goes into much more detail about the utility and how to monitor by “zones” or “projects”… very useful stuff.

lsof = pfiles(1M)… the proc(1) commands

pfiles(1M) mirrors what “lsof” does but there is much more information available on a per-process basis available.  A man of “proc(1)” shows the available process related commands: “pstack(1M)”, “pldd(1M)”, and “pflags(1M) to name a few.  These utilities referred to as process introspection commands are detailed nicely on the solarisinternals.com website.

vmstat(1M), mpstat(1M), iostat(1M), sar(1M)… basically the same!

It is good to know that somethings are not all that different.  The above tools have minor differences, but generally look the same.  Some options are different and expanded.   Take for example iostat.

“iostat(1M) for Solaris has as a “z” option that takes out the devices that are not doing any IO and have “Zeros”.  The biggest issue with most of these tools come into play when there are missing options or the formatting is different.  This messes up scripts that have been developed to help aid analysis.  This is not too hard to fix… just something that is going to have to be worked out.

References

The best references for Solaris performance and analysis would be the “Solaris Internals” and the “Solaris Performance and Tools” books.  These books describe the architecture of Solaris and show how to analyze and diagnosis performance issues… and you can get them on the kindle 🙂  The books also have an accompanying website “solarisinternals.com” to continue the discussion.

That is all for now…

Kernel NFS fights back… Oracle throughput matches Direct NFS with latest Solaris improvements

After my recent series of postings, I was made aware of David Lutz’s blog on NFS client performance with Solaris.  It turns out that you can vastly improve the performance of NFS clients using a new parameter to adjust the number of client connections.

root@saemrmb9> grep rpcmod /etc/system
set rpcmod:clnt_max_conns=8

This parameter was introduced in a patch for various flavors of Solaris.  For details on the various flavors, see David Lutz’s recent blog entry on improving NFS client performance.  Soon, it should be the default in Solaris making out-of-box client performance scream.

DSS query throughput with Kernel NFS

I re-ran the DSS query referenced in my last entry and now kNFS matches the throughput of dNFS with 10gigE.

Kernel NFS throughput with Solaris 10 Update 8 (set rpcmod:clnt_max_conns=8)

This is great news for customers not yet on Oracle 11g.  With this latest fix to Solaris, you can match the throughput of Direct NFS on older versions of Oracle.  In a future post, I will explore the CPU impact of dNFS and kNFS with OLTP style transactions.

Direct NFS vs Kernel NFS bake-off with Oracle 11g and Solaris… and the winner is

NOTE::  Please see my next entry on Kernel NFS performance and the improvements that come with the latest Solaris.

==============

After experimenting with dNFS it was time to do a comparison with the “old” way.  I was a little surprised by the results, but I guess that really explains why Oracle decided to embed the NFS client into the database 🙂

bake-off with OLTP style transactions

This experiment was designed to load up a machine, a T5240, with OLTP style transactions until no more CPU available.  The dataset was big enough to push about 36,000 IOPS read and 1,500 IOPS write during peak throughput.  As you can see, dNFS performed well which allowed the system to scale until DB server CPU was fully utilized.   On the other hand, Kernel NFS throttles after 32 users and is unable to use the available CPU to scale transactional throughput.

lower cpu overhead yields better throughput

A common measure for benchmarks is to figure out how many transactions per CPU are possible.  Below, I plotted the CPU content needed for a particular transaction rate.  This chart shows the total measured CPU (user+system) to for a given TPS rate.

dNFS vs kNFS (TPS/CPU)

As expected, the transaction rate per CPU is greater when using dNFS vs kNFS.  Please do note, that this is a T5240 machine that has 128 threads or virtual CPUs.  I don’t want to go into semantics of sockets, cores, pipelines, and threads but thought it was at least worth noting.  Oracle sees a thread of a T5240 as a CPU, so that is what I used for this comparison.

silly little torture test

When doing the OLTP style tests with a normal sized SGA, I was not able to fully utilize the 10gigE interface or the Sun 7410 storage.   So, I decided to do a silly little micro benchmark with a real small SGA.  This benchmark just does simple read-only queries that essentially result in a bunch of random 8k IO.  I have included the output from the Fishworks analytics below for both kNFS and dNFS.

Random IOPS with kNFS and Sun Open Storage

Random IOPS with dNFS and Sun 7410 open storage

I was able to hit ~90K IOPS with 729MB/sec of throughput with just one 10gigE interface connected to Sun 7140 unified storage.  This is an excellent result with Oracle 11gR2 and dNFS for a random test IO test… but there is still more bandwidth available.  So, I decided to do a quick DSS style query to see if I could break the 1GB/sec barrier.

===dNFS===
SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
 COUNT(*)
----------
 40025111
Elapsed: 00:00:06.36

===kNFS===
SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
 COUNT(*)
----------
 40025111

Elapsed: 00:00:16.18

kNFS table scan

dNFS table scan

Excellent, with a simple scan I was able to do 1.14GB/sec with dNFS more than doubling the throughput of kNFS.

configuration notes and basic tuning

I was running on a T5240 with Solaris 10 Update 8.

$ cat /etc/release
Solaris 10 10/09 s10s_u8wos_08a SPARC
Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 September 2009

This machine has the a built-in 10gigE interface which uses multiple threads to increase throughput.  Out of the box, there is very little to tuned as long as you are on Solaris 10 Update 8.  I experimented with various settings, but found that only basic tcp settings were required.

ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000
ndd -set /dev/tcp tcp_max_buf 2097152
ndd -set /dev/tcp tcp_cwnd_max 2097152

Finally, on the storage front, I was using the Sun Storage 7140 Unified storage server as the NFS server for this test.  This server was born out of the Fishworks project and is an excellent platform for deploying NFS based databases…. watch out NetApp.

what does it all mean?

dNFS wins hands down.  Standard kernel NFS only essentially allows one client per “mount” point.  So eventually, we see data queued to a mount point.  This essentially clips the throughput far too soon.   Direct NFS solves this problem by having each Oracle shadow process mount the device directly.  Also with dNFS, all the desired tuning and mount point options are not necessary.  Oracle knows what options are most efficient for transferring blocks of data and configures the connection properly.

When I began down this path of discovery, I was only using NFS attached storage because nothing else was available in our lab… and IO was not initially a huge part of the project at hand.  Being a performance guy who benchmarks systems to squeeze out the last percentage point of performance, I was skeptical about NAS devices.  Traditionally, NAS was limited by slow networks and clumsy SW stacks.   But times change.   Fast 10gigE networks and Fishworks storage combined with clever SW like Direct NFS really showed this old dog a new trick.

Monitoring Direct NFS with Oracle 11g and Solaris… pealing back the layers of the onion.

When I start a new project, I like to check performance from as many layers as possible.  This helps to verify things are working as expected and helps me to understand how the pieces fit together.  My recent work with dNFS and Oracle 11gR2, I started down the path to monitor performance and was surprised to see that things are not always as they seem.  This post will explore the various ways to monitor and verify performance when using dNFS with Oracle 11gR2 and Sun Open StorageFishworks“.

why is iostat lying to me?

iostat(1M)” is one of the most common tools to monitor IO.  Normally, I can see activity on local devices as well as NFS mounts via iostat.  But, with dNFS, my device seems idle during the middle of a performance run.

bash-3.0$ iostat -xcn 5
cpu
us sy wt id
8  5  0 87
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0    6.2    0.0   45.2  0.0  0.0    0.0    0.4   0   0 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf
cpu
us sy wt id
7  5  0 89
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   57.9    0.0  435.8  0.0  0.0    0.0    0.5   0   3 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf

From the DB server perspective, I can’t see the IO.  I wonder what the array looks like.

what does fishworks analytics have to say about IO?

The analytics package available with fishworks is the best way to verify performance with Sun Open Storage.  This package is easy to use and indeed I was quickly able to verify activity on the array.

There are 48,987 NFSv3 operations/sec and ~403MB/sec going through the nge13 interface.  So, this array is cooking pretty good.  So, let’s take a peek at the network on the DB host.

nicstat to the rescue

nicstat is wonderful tool developed by Brendan Greg at Sun to show network performance.  Nicstat really shows you the critical data for monitoring network speeds and feeds by displaying packet size, utilization, and rates of the various interfaces.

root@saemrmb9> nicstat 5
Time          Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs %Util    Sat
15:32:11    nxge0    0.11    1.51    1.60    9.00   68.25   171.7  0.00   0.00
15:32:11    nxge1  392926 13382.1 95214.4 95161.8  4225.8   144.0  33.3   0.00

So, from the DB server point of view, we are transferring about 390MB/sec… which correlates to what we saw with the analytics from Fishworks.  Cool!

why not use DTrace?

Ok, I wouldn’t be a good Sun employee if I didn’t use DTrace once in a while.  I was curious to see the Oracle calls for dNFS so I broke out my favorite tool from the DTrace Toolkit. The “hotuser” tool shows which functions are being called the most.  For my purposes, I found an active Oracle shadow process and searched for NFS related functions.

root@saemrmb9> hotuser -p 681 |grep nfs
^C
oracle`kgnfs_getmsg                                         1   0.2%
oracle`kgnfs_complete_read                                  1   0.2%
oracle`kgnfswat                                             1   0.2%
oracle`kgnfs_getpmsg                                        1   0.2%
oracle`kgnfs_getaprocdata                                   1   0.2%
oracle`kgnfs_processmsg                                     1   0.2%
oracle`kgnfs_find_channel                                   1   0.2%
libnfsodm11.so`odm_io                                       1   0.2%
oracle`kgnfsfreemem                                         2   0.4%
oracle`kgnfs_flushmsg                                       2   0.4%
oracle`kgnfsallocmem                                        2   0.4%
oracle`skgnfs_recvmsg                                       3   0.5%
oracle`kgnfs_serializesendmsg                               3   0.5%

So, yes it seems Direct NFS is really being used by Oracle 11g.

performance geeks love V$ tables

There are a set of V$ tables that allow you to sample the performance of the performance of dNFS as seen by Oracle.  I like V$ tables because I can write SQL scripts until I run out of Mt. Dew.  The following views are available to monitor activity with dNFS.

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

With some simple scripting, I was able to create a simple script to monitor the NFS IOPS by sampling the v$dnfs_stats view.  This script simply samples the nfs_read and nfs_write operations, pauses for 5 seconds, then samples again to determine the rate.

timestmp|nfsiops
15:30:31|48162
15:30:36|48752
15:30:41|48313
15:30:46|48517.4
15:30:51|48478
15:30:56|48509
15:31:01|48123
15:31:06|48118.8

Excellent!  Oracle shows 48,000 NFS IOPS which agrees with the analytics from Fishworks.

what about the AWR?

Consulting the AWR, shows “Physical reads” in agreement as well.

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               93.1            1,009.2       0.00       0.00
       DB CPU(s):               54.2              587.8       0.00       0.00
       Redo size:            4,340.3           47,036.8
   Logical reads:          385,809.7        4,181,152.4
   Block changes:                9.1               99.0
  Physical reads:           47,391.1          513,594.2
 Physical writes:                5.7               61.7
      User calls:           63,251.0          685,472.3
          Parses:                5.3               57.4
     Hard parses:                0.0                0.1
W/A MB processed:                0.1                1.1
          Logons:                0.1                0.7
        Executes:           45,637.8          494,593.0
       Rollbacks:                0.0                0.0
    Transactions:                0.1

so, why is iostat lying to me?

iostat(1M) monitors IO to devices and nfs mount points.  But with Oracle Direct NFS, the mount point is bypassed and each shadow process simply mounts files directly.  To monitor dNFS traffic you have to use other methods as described here.  Hopefully, this post was instructive on how to peel back the layers in-order to gain visibility into dNFS performance with Oracle and Sun Open Storage.

Direct NFS access to Sun Storage 7410 with Oracle 11g and Solaris… configuration and verifcation

During the course of experimentation with 11gR2, I was given some space on a Sun Storage 7410 NAS.   In the past NAS meant using NFS with obscure mount options that seemed to vary from platform to platform.  So, at first I went scrambling for the “best practices” to use with Oracle NAS on Solaris.

There is a nice Metalink article Note:359515.1 with the latest information for all platforms.  This Metalink note does include the “tcp” option which is not necessary on Solaris.  So it boiled down to the following mount options for using Oracle data files on NAS devices with Solaris.

rw,bg,hard,nointr,rsize=32768,
wsize=32768,noac,
forcedirectio, vers=3,suid

But wait, what about the new 11g feature to use direct NFS “dNFS”?    More searching…

configuring dNFS on Solaris

This is a fairly simple process.  Although Oracle dNFS configuration is fairly well documented for Linux, I will post my interpretation and commentary to help other Solaris users that might want to configure dNFS.

First, create mount the NFS share just as you would have in the past.  Oracle still needs to see the file system from the OS point of view.  You don’t have to use the mount options as in the past, but you might want them anyway for OS tools may access the mount.  You would most likely place these options in the “/etc/vfstab” file, but I will just show the mount command.

mount -o rw,bg,hard,nointr,rsize=32768,\
wsize=32768,noac,forcedirectio,vers=3,suid \

toromondo.west:/export/glennf /ar1

Second, you have to link the direct NFS libraries in place of ODM.  This is a little clunky, but not terrible.

cd $ORACLE_HOME/lib
cp libodm11.so libodm11.so_stub
ln -s libnfsodm11.so libodm11.so

Third, create the “$ORACLE_HOME/dbs/oranfstab” file.  This file defines the various details Oracle needs to directly access the NFS share.  You can configure multiple paths, so that Oracle can multiplex access to the NFS share.  This is for redundancy and load balancing.  There is another Metalink article ID:822481.1 that details how to configure dNFS with multiple paths across the same subnet and force the OS to not route packets.  This is a great feature, which I will try once I get some more network plumbing.  For now, I just did the most simple configuration as shown below.

cat $ORACLE_HOME/dbs/oranfstab
server: toromondo.west
path: toromondo.west
export: /export/glennf mount:/ar1

Finally, you will be able to see if this takes effect by looking at the “alert.log” file.  When Oracle starts up it places debug information in the alert.log file so we can see if Oracle is using Direct NFS or not.

grep NFS alert_*.log
Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 2.0
Direct NFS: attempting to mount /export/glennf on filer toromondo.west defined in oranfstab
Direct NFS: channel config is:
Direct NFS: mount complete dir /export/glennf on toromondo.west mntport 38844 nfsport 2049
Direct NFS: channel id [0] path [toromondo.west] to filer [toromondo.west] via local [] is UP
Direct NFS: channel id [1] path [toromondo.west] to filer [toromondo.west] via local [] is UP

That’s all there is to it.  Hopefully, you will find this useful.