Direct NFS vs Kernel NFS bake-off with Oracle 11g and Solaris… and the winner is

NOTE::  Please see my next entry on Kernel NFS performance and the improvements that come with the latest Solaris.


After experimenting with dNFS it was time to do a comparison with the “old” way.  I was a little surprised by the results, but I guess that really explains why Oracle decided to embed the NFS client into the database :)

bake-off with OLTP style transactions

This experiment was designed to load up a machine, a T5240, with OLTP style transactions until no more CPU available.  The dataset was big enough to push about 36,000 IOPS read and 1,500 IOPS write during peak throughput.  As you can see, dNFS performed well which allowed the system to scale until DB server CPU was fully utilized.   On the other hand, Kernel NFS throttles after 32 users and is unable to use the available CPU to scale transactional throughput.

lower cpu overhead yields better throughput

A common measure for benchmarks is to figure out how many transactions per CPU are possible.  Below, I plotted the CPU content needed for a particular transaction rate.  This chart shows the total measured CPU (user+system) to for a given TPS rate.


As expected, the transaction rate per CPU is greater when using dNFS vs kNFS.  Please do note, that this is a T5240 machine that has 128 threads or virtual CPUs.  I don’t want to go into semantics of sockets, cores, pipelines, and threads but thought it was at least worth noting.  Oracle sees a thread of a T5240 as a CPU, so that is what I used for this comparison.

silly little torture test

When doing the OLTP style tests with a normal sized SGA, I was not able to fully utilize the 10gigE interface or the Sun 7410 storage.   So, I decided to do a silly little micro benchmark with a real small SGA.  This benchmark just does simple read-only queries that essentially result in a bunch of random 8k IO.  I have included the output from the Fishworks analytics below for both kNFS and dNFS.

Random IOPS with kNFS and Sun Open Storage

Random IOPS with dNFS and Sun 7410 open storage

I was able to hit ~90K IOPS with 729MB/sec of throughput with just one 10gigE interface connected to Sun 7140 unified storage.  This is an excellent result with Oracle 11gR2 and dNFS for a random test IO test… but there is still more bandwidth available.  So, I decided to do a quick DSS style query to see if I could break the 1GB/sec barrier.

SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;
Elapsed: 00:00:06.36

SQL> select /*+ parallel(item,32) full(item) */ count(*) from item;

Elapsed: 00:00:16.18

kNFS table scan

dNFS table scan

Excellent, with a simple scan I was able to do 1.14GB/sec with dNFS more than doubling the throughput of kNFS.

configuration notes and basic tuning

I was running on a T5240 with Solaris 10 Update 8.

$ cat /etc/release
Solaris 10 10/09 s10s_u8wos_08a SPARC
Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 September 2009

This machine has the a built-in 10gigE interface which uses multiple threads to increase throughput.  Out of the box, there is very little to tuned as long as you are on Solaris 10 Update 8.  I experimented with various settings, but found that only basic tcp settings were required.

ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000
ndd -set /dev/tcp tcp_max_buf 2097152
ndd -set /dev/tcp tcp_cwnd_max 2097152

Finally, on the storage front, I was using the Sun Storage 7140 Unified storage server as the NFS server for this test.  This server was born out of the Fishworks project and is an excellent platform for deploying NFS based databases…. watch out NetApp.

what does it all mean?

dNFS wins hands down.  Standard kernel NFS only essentially allows one client per “mount” point.  So eventually, we see data queued to a mount point.  This essentially clips the throughput far too soon.   Direct NFS solves this problem by having each Oracle shadow process mount the device directly.  Also with dNFS, all the desired tuning and mount point options are not necessary.  Oracle knows what options are most efficient for transferring blocks of data and configures the connection properly.

When I began down this path of discovery, I was only using NFS attached storage because nothing else was available in our lab… and IO was not initially a huge part of the project at hand.  Being a performance guy who benchmarks systems to squeeze out the last percentage point of performance, I was skeptical about NAS devices.  Traditionally, NAS was limited by slow networks and clumsy SW stacks.   But times change.   Fast 10gigE networks and Fishworks storage combined with clever SW like Direct NFS really showed this old dog a new trick.

About these ads

25 Responses to “Direct NFS vs Kernel NFS bake-off with Oracle 11g and Solaris… and the winner is”

  1. 1 Peter Galvin December 14, 2009 at 2:59 pm

    Hi Glenn, very good post and information. But what is the history / state of d(irect)NFS? Is it only a feature of Oracle as a client-side function, or is there server-side support as well? Thanks.

  2. 2 glennfawcett December 14, 2009 at 4:12 pm

    Thanks for your interest.

    Direct NFS was introduced in Oracle 11gR1 so it has been around for a couple years. It is only a client-side feature, but that is all that is needed. NAS storage devices like the Sun storage 7410 are already ready optimized on the server side. This was really the missing piece.

  3. 3 Ruben December 14, 2009 at 11:46 pm

    Very good article. Do you know anything about NFS v4. As far as I know all has been done so far for NFS v3.

  4. 5 Martin December 15, 2009 at 4:31 am

    Hi Glenn,

    Good post, thanks for sharing! Has there been any testing done with a setup such as:

    - FC Storage Array
    - serving storage to an intermediate host (say openfiler)
    - which in turn exports the storage via NFS
    - to be mounted via direct NFS

    I appreciate that this isn’t the way to do it but I’m curious what additional latency we’d see.



    • 6 glennfawcett December 15, 2009 at 3:17 pm

      No I did not test this with any other type of storage… and actually, this was not my initial intent :) As part of our TPC-C effort the Comstar server was used to front flash. The latency was ~400 microseconds if I am not mistaken.

  5. 7 Alex Gorbachev December 15, 2009 at 8:10 am

    Thanks for good info Glenn.
    Quick question – what are you using to pump the traffic to the Oracle database?

    • 8 glennfawcett December 15, 2009 at 12:45 pm

      An internal test kit that I can’t really talk about :)

      • 9 Alex Gorbachev December 15, 2009 at 1:01 pm

        It’s like… “Oh I’m playing with such a nice toy but I can’t say anything about it… oh, it’s so cool!”

      • 10 glennfawcett December 15, 2009 at 3:13 pm

        Well… this is really a simple OLTP style test kit. You can get a similar test with something like SwingBench. The silly little micro benchmark just does simple single value look ups on an indexed table a read-only version of the benchmark with an absurdly small SGA “200MB”.

      • 11 kevinclosson December 17, 2009 at 10:51 am

        “just does simple single value look ups on an indexed table”

        …nah, Glenn, you’re being mean to the workload :-)

        The query mix chosen for this test is, in fact, 4 tables with 4 indexes and 50% of the queries are split between 4-table and 2-table joins. There are 5.6 LIO per User call. That said, the point of the test was to force PIO and when executed with a very small SGA you’ll get just that.

  6. 12 Martin Berger December 15, 2009 at 11:49 pm

    can you explain (or at least imagine) why the txn/sec at 24,32 and even 48 users are below the expected curve?
    Have you over-utilized your CPUs, and if yes, where txn/sec constant or declining again?
    thank you for sharing these informations!

    • 13 glennfawcett December 16, 2009 at 10:56 am

      I have not worked this out fully, but I speculate it has to do with the CMT servers. CPU’s were not fully utilized and it recovers. If I get some time, I am going to do some runs on another architecture to see if it could be the workload that is creating the peculiarity.

  7. 14 James Morle December 16, 2009 at 4:36 am

    Hey Glenn,

    Nice post, it’s good to get some solid numbers for this. I’ve been a big DNFS proponent for a couple of years now since I ran a an I/O benchmark for a now-defunct hardware company. I’ve not used it at all on Solaris, but it certainly makes a big difference on Linux platforms, especially in terms of side-stepping CPU-bound rpciod daemons in favour of many user-space processes.
    I’ve recently started being more bullish in my customer I/O recommendations too: Oracle storage platforms should all be DirectNFS or Exadata, your choice. Doesn’t really help HP, IBM or EMC, but 10GbE and DNFS (especially with multipathing) have just made FC SANs redundant for Oracle platforms. NFS can easily support all but the biggest platform, and that’s where eyes should go towards Exadata.
    Just my 2c, but it seems logical for 99% of cases.



  8. 16 Jason December 16, 2009 at 8:26 pm

    I’m curious what’d happen if the # of nfs servers was increased — IIRC the default on Solaris has _always_ (and still is) notoriously low and is usually one of the first things that is recommended you bump up _significantly_ (by a factor of 10 at least).

    • 17 glennfawcett December 17, 2009 at 9:39 am

      NFS servers is already high and doesn’t effect the client side. Stay tuned however… I am about to post how to get performance without dNFS.

  9. 18 Jon March 9, 2011 at 2:04 pm

    Nice work on Solaris.
    Do you have any info, configuration, etc using Redhat linux ?

    • 19 glennfawcett May 4, 2011 at 6:29 am

      No. But it would be interesting to test this with more recent Linux versions. The Oracle dNFS client was created due to the deficiencies in Linux NFS client, but time has passed and things do change. Unfortunately, I do not have access to HW like I did in my previous job.

  10. 20 Christophe Lesbats September 7, 2011 at 5:54 am

    Nice work Glenn and appreciate it. I am currently investigating rman behavior when using kernel NFS and dNFS from an Exadata node (linux) to a 7420 appliance. Looking at the NFS ops in analytics, I have noticed that when using kernel NFS the appliance was receiving a commit for each write what ever the mount option are (noac set or not).
    When using dNFS there is no commit at all, so of course in that particular case dNFS is much faster but NFS would probably much faster without the sync writes.

    In your testing do you had a look to the nfs ops received by the appliance and check if the number of commit was the same in both cases.


  1. 1 Direct NFS in Solaris with Oracle 11g: Benchmarks « UNIX Administratosphere Trackback on December 15, 2009 at 2:01 am
  2. 2 Trackback on December 15, 2009 at 11:01 am
  3. 3 More Words About Oracle Direct NFS On Sun Storage 7410 And Chip Multithreading Technology (CMT) For Oracle Database 11g Release 2. « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on December 15, 2009 at 9:06 pm
  4. 4 Blogroll Report 11/12/2009-18/12/2009 « Coskan’s Approach to Oracle Trackback on January 2, 2010 at 9:50 am
  5. 5 love Oracle » Blog Archive » Oracle Direct NFS Trackback on July 1, 2012 at 6:22 am

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 316 other followers

%d bloggers like this: