Monitoring Direct NFS with Oracle 11g and Solaris… pealing back the layers of the onion.

When I start a new project, I like to check performance from as many layers as possible.  This helps to verify things are working as expected and helps me to understand how the pieces fit together.  My recent work with dNFS and Oracle 11gR2, I started down the path to monitor performance and was surprised to see that things are not always as they seem.  This post will explore the various ways to monitor and verify performance when using dNFS with Oracle 11gR2 and Sun Open StorageFishworks“.

why is iostat lying to me?

iostat(1M)” is one of the most common tools to monitor IO.  Normally, I can see activity on local devices as well as NFS mounts via iostat.  But, with dNFS, my device seems idle during the middle of a performance run.

bash-3.0$ iostat -xcn 5
cpu
us sy wt id
8  5  0 87
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0    6.2    0.0   45.2  0.0  0.0    0.0    0.4   0   0 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf
cpu
us sy wt id
7  5  0 89
extended device statistics
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   57.9    0.0  435.8  0.0  0.0    0.0    0.5   0   3 c1t0d0
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 toromondo.west:/export/glennf

From the DB server perspective, I can’t see the IO.  I wonder what the array looks like.

what does fishworks analytics have to say about IO?

The analytics package available with fishworks is the best way to verify performance with Sun Open Storage.  This package is easy to use and indeed I was quickly able to verify activity on the array.

There are 48,987 NFSv3 operations/sec and ~403MB/sec going through the nge13 interface.  So, this array is cooking pretty good.  So, let’s take a peek at the network on the DB host.

nicstat to the rescue

nicstat is wonderful tool developed by Brendan Greg at Sun to show network performance.  Nicstat really shows you the critical data for monitoring network speeds and feeds by displaying packet size, utilization, and rates of the various interfaces.

root@saemrmb9> nicstat 5
Time          Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs %Util    Sat
15:32:11    nxge0    0.11    1.51    1.60    9.00   68.25   171.7  0.00   0.00
15:32:11    nxge1  392926 13382.1 95214.4 95161.8  4225.8   144.0  33.3   0.00

So, from the DB server point of view, we are transferring about 390MB/sec… which correlates to what we saw with the analytics from Fishworks.  Cool!

why not use DTrace?

Ok, I wouldn’t be a good Sun employee if I didn’t use DTrace once in a while.  I was curious to see the Oracle calls for dNFS so I broke out my favorite tool from the DTrace Toolkit. The “hotuser” tool shows which functions are being called the most.  For my purposes, I found an active Oracle shadow process and searched for NFS related functions.

root@saemrmb9> hotuser -p 681 |grep nfs
^C
oracle`kgnfs_getmsg                                         1   0.2%
oracle`kgnfs_complete_read                                  1   0.2%
oracle`kgnfswat                                             1   0.2%
oracle`kgnfs_getpmsg                                        1   0.2%
oracle`kgnfs_getaprocdata                                   1   0.2%
oracle`kgnfs_processmsg                                     1   0.2%
oracle`kgnfs_find_channel                                   1   0.2%
libnfsodm11.so`odm_io                                       1   0.2%
oracle`kgnfsfreemem                                         2   0.4%
oracle`kgnfs_flushmsg                                       2   0.4%
oracle`kgnfsallocmem                                        2   0.4%
oracle`skgnfs_recvmsg                                       3   0.5%
oracle`kgnfs_serializesendmsg                               3   0.5%

So, yes it seems Direct NFS is really being used by Oracle 11g.

performance geeks love V$ tables

There are a set of V$ tables that allow you to sample the performance of the performance of dNFS as seen by Oracle.  I like V$ tables because I can write SQL scripts until I run out of Mt. Dew.  The following views are available to monitor activity with dNFS.

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

With some simple scripting, I was able to create a simple script to monitor the NFS IOPS by sampling the v$dnfs_stats view.  This script simply samples the nfs_read and nfs_write operations, pauses for 5 seconds, then samples again to determine the rate.

timestmp|nfsiops
15:30:31|48162
15:30:36|48752
15:30:41|48313
15:30:46|48517.4
15:30:51|48478
15:30:56|48509
15:31:01|48123
15:31:06|48118.8

Excellent!  Oracle shows 48,000 NFS IOPS which agrees with the analytics from Fishworks.

what about the AWR?

Consulting the AWR, shows “Physical reads” in agreement as well.

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               93.1            1,009.2       0.00       0.00
       DB CPU(s):               54.2              587.8       0.00       0.00
       Redo size:            4,340.3           47,036.8
   Logical reads:          385,809.7        4,181,152.4
   Block changes:                9.1               99.0
  Physical reads:           47,391.1          513,594.2
 Physical writes:                5.7               61.7
      User calls:           63,251.0          685,472.3
          Parses:                5.3               57.4
     Hard parses:                0.0                0.1
W/A MB processed:                0.1                1.1
          Logons:                0.1                0.7
        Executes:           45,637.8          494,593.0
       Rollbacks:                0.0                0.0
    Transactions:                0.1

so, why is iostat lying to me?

iostat(1M) monitors IO to devices and nfs mount points.  But with Oracle Direct NFS, the mount point is bypassed and each shadow process simply mounts files directly.  To monitor dNFS traffic you have to use other methods as described here.  Hopefully, this post was instructive on how to peel back the layers in-order to gain visibility into dNFS performance with Oracle and Sun Open Storage.

About these ads

10 Responses to “Monitoring Direct NFS with Oracle 11g and Solaris… pealing back the layers of the onion.”


  1. 1 Konrad November 26, 2009 at 3:04 am

    That makes sense. Direct NFS wouldn’t be touching any OS counters for NFS or RPC as it doesn’t need to use that codepath anymore, I’d assume that libnfsodm11.so would establish it’s own TCP (or does it use UDP) socket out to the NFS client.

    Should be able to see access via Wireshark as well – along with the associated datafiles if filehandle to filename snooping is enabled – that would be nice.

    One of the issues we used to have with OS NFS and NetApp, is that when one of our RAC nodes were evicted and/or hard rebooted. The filer would need to be instructed to release locks or we would have to wait for the locks to timeout before those files could be written to again.

    i.e.
    “ORA-27086: unable to lock file – already in use
    SVR4 Error: 11: Resource temporarily unavailable
    Additional information: 8
    Additional information: 25723″
    When the instance was entering mount state and opening a control file.

    Does Direct NFS still make lock requests to files before writing?

    One thing I didn’t like when I read it in the doco (at least from the Grid Infrastructure Installation Guide for Linux), is that Direct NFS will drop down into “compatibility” mode and try to use OS NFS if it fails to establish connectivity when an instance is opened.

    “If Oracle Database cannot open an NFS server using Direct NFS, then Oracle Database uses the platform operating system kernel NFS client.”

    Which would suggest that if it can allow one node to use Direct NFS and another node to use OS NFS then it had better be locking files to be safe.

    CREATE OR REPLACE TRIGGER call_storage_ops AFTER SERVERERROR ON DATABASE
    WHEN (SYS.SERVER_ERROR(1) = 27086)
    BEGIN
    execute sys.dbms_system.ksdwrt(2,’ORA-27086: Hello Ops. This is your friendly database asking for help again.’);
    execute sys.dbms_system.ksdwrt(2,’ORA-27086: Can you please ring up Storage Ops and ask them to release those blasted file locks again. ;)’);
    execute sys.dbms_system.ksdwrt(2,’ORA-27086: Then try and start me again please.’);
    END;
    /

    • 2 Konrad November 26, 2009 at 4:28 am

      Hmmm, now that’s silly. How is am instance going to read the data dictionary to execute a trigger when it can’t even mount the db!?

      oh well – i tried. :-)

  2. 3 glennfawcett November 26, 2009 at 7:48 am

    Thanks for your comments Konrad.

    I would not think Oracle would need to a lock request before writing to a file since it is in charge of consistency, but I am not 100% positive. I will do some checking and comment if lock requests are used.

  3. 4 Scott February 18, 2010 at 12:26 pm

    Glenn,
    Any chance you could send me the script that you wrote for the v$dnfs views. I could probably reproduce it, but since you’ve already done the work it would save some time.

    Thanks,
    Scott

  4. 6 nm December 31, 2013 at 3:37 am

    Hi Glenn, When you reference the AWR report for IOPS you look at physical Reads Per Second statistics. I’m a bit confused I always thought to measure IOPS one needed to total:
    Physical Read Total IO Requests + Physical Writes Total IO Requests. I got this form the following document:
    .

    http://www.oracle.com/technetwork/database/back.pdf

    .
    Slide 13.
    .
    Now if I compare Physical Reads statistics of an AWR report to the above i.e only Physical Read Total IO Requests – they never match. So which one is correct?


  1. 1 Direct NFS in Solaris with Oracle 11g: Benchmarks « UNIX Administratosphere Trackback on December 15, 2009 at 2:01 am
  2. 2 More Words About Oracle Direct NFS On Sun Storage 7410 And Chip Multithreading Technology (CMT) For Oracle Database 11g Release 2. « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on December 15, 2009 at 9:06 pm
  3. 3 Blogroll Report 20/11/2009-27/11/2009 « Coskan’s Approach to Oracle Trackback on December 17, 2009 at 4:15 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





Follow

Get every new post delivered to your Inbox.

Join 340 other followers

%d bloggers like this: