Exadata V2 IOPS are massive… but predictable latency will keep your applications humming and help you sleep at night.

After talking with customers over the years, I find that performance is necessary… but predictability is what really matters.  Today’s data centers are complex.  Applications  are very dynamic changing often on a daily basis and new applications are being added to the mix as well.  The constant application morphing is enough to scare most DBA’s.  But, the scariest thing of all is, the resource requirements are often not well understood… chaos indeed.

the dreaded 1:00AM call

How many DBAs and system admins have got the dreaded 1:00AM call only to track down the problem to be “application” update that was applied just prior?  So, the patch added some “new” functionality or added some additional batch processing.  This typically amounts to an increase in IOPS and unless you are well provisioned,  application response time will typically increase as well.  How do you provide predictable user response time in this environment?

response time with latency-sensitive applications

I was just reading through Kevin’s most recent post on the Sun Oracle database machine.  He does a great job of describing the subtleties of a consolidated environment, and the benefit that Sun FlashFire technology provides.  I particularly like the following quote:

What if you have a database that doesn’t require extreme IOPS but requires very low latency I/O served from a data set of, say, 1 TB. Imagine further that this latency-sensitive database isn’t the only database you have.

In typical ERP/CRM environments, end-user response time is often directly tied to IO response time.  If you look at the recent Sun/Oracle TPC-C announcements with FlashFire, you will get an idea of what I mean.  This Sun/Oracle TPC-C result showed a 10x drop in response time over the next highest result.  TPC-C is very sensitive to IO latency,  so it is no wonder that Flash was a win in this environment.

For another example, consider this AWR excerpt from a customer application…

                                 Reads               CPU   Elapsed
Physical Reads  Executions    per Exec   %Total Time (s)  Time (s)    SQL Id
-------------- ----------- ------------- ------ -------- --------- -------------
667,627             16,020          41.7   45.3  2456.29  20703.06 ggyyzqn066f38
Module: java@myhost (TNS V1-V3)
BEGIN PKGCUST.SHOW_CUST (:1, :2); END;

The above query averages 41.7 reads with a query response time of 1.3 seconds.  The above example has an average IO latency of 7ms, so 41.7 reads would take about 292ms.  But these are 15,000 rpm drives and at 7ms response time, these disks are getting pretty hot.  This environment experienced a 4x increase in IOPS due to new functionality and some rogue applications.  This increase caused the query response time to jump from 1.3 -> 2.2 seconds.

Now, if this query was running on Exadata V2, things would look different.  This sudden spike in IOPS would not affect the IO response time.  With Exadata V2, the 41.7 reads would happen in about 21ms and this would not change with an 4x increase in IOPS.  We have measured the response time and it remains flat at about 0.5 ms whether you are doing 100 IOPS or a Million.

sleep well at night with FlashCache

Exadata V2 solves the problem of predictable IO latency by providing more IOPS than applications can actually use.  If a workload increases or new functionality is added, you don’t have to worry about IO latency.    I like the way the Kevin put’s it in his blog post:

No, I’m not as excited about the IOPS capability of the Database Machine as much as the FLASH storage-provisioning model it offers. In spite of all the hype, the Database Machine IOPS story is every bit as much a story of elegance as it is brute force.

FlashCache puts the IOPS where they are needed most.  If you run multiple instances in a dynamic environment, Exadata V2 with FlashCache might just help you avoid the dreaded 1:00AM call.

About these ads

7 Responses to “Exadata V2 IOPS are massive… but predictable latency will keep your applications humming and help you sleep at night.”


  1. 1 karlarao November 7, 2009 at 6:47 pm

    Glenn,

    And it will make the query run in just minutes.. :)

  2. 2 glennfawcett November 9, 2009 at 8:40 am

    I suppose if it took hours before, then yes just minutes. But, if I had a query that took hours, I might tune it a bit :)

  3. 3 Henry November 12, 2009 at 8:12 pm

    Glenn,

    Just wondering why you are saying that “at 7ms response time, these disks are getting pretty hot.” My thoughts would be that for 15,000 rpm drives, the service time would be ~7ms, but you would only see that with no queueing, low utilization. Unless, of course, you have a whole bunch of disks, in which case the service time is effectively parallelized across the disks if you have enough io going on. Sort of like the example discussed in http://www.perfdynamics.blogspot.com/2009/10/parallelism-in-pdq.html

    I’m still working on getting a consistent interpretation of all of this stuff, so I was wondering where my view differs from yours.

    Henry

  4. 4 glennfawcett November 16, 2009 at 9:54 am

    Let’s see if this helps…

    For 15K disks, the service time should be 60/15000 or 4ms without queuing. The fact that Oracle was seeing 7ms tells me there is likely some queuing. For optimal response time, you really don’t want to have much in the way of queuing. Now on the other hand, you will have to queue in order to max out the IOPS available. In the case of Exadata V2, you don’t have to worry about that because you really never queue due to the huge amount of IOPS available.

    • 5 karlarao December 10, 2009 at 4:29 am

      Hi Glenn,

      Interesting.. I just want to make this thing clear..

      What is the unit of measurement for 60.. and 15000 ?

      And why is it 60/15000 ?

      • 6 glennfawcett December 10, 2009 at 11:54 am

        60 seconds in a minute and 15,000 revolutions per minute… so the physics behind the latency is 60/15000 or 4ms. There is more involved with caching and such, but clearly 7ms means there is not much left.

  5. 7 bdrouvot November 29, 2012 at 2:11 am

    Hello,

    I just want to let you know that I developed a script to extract exadata real-time metric information based on cumulative metrics.

    The main idea is that cumulative, instantaneous, rates and transition exadata metrics are not enough to answer all the basic questions.

    That’s why the script has been created as it provides a better understanding of what’s is going on on the cells right now.

    More details (on how and why) here : http://bdrouvot.wordpress.com/2012/11/27/exadata-real-time-metrics-extracted-from-cumulative-metrics/

    Please don’t hesitate to give your opinion and report any issue you may found with it.

    Thx
    Bertrand


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





Follow

Get every new post delivered to your Inbox.

Join 340 other followers

%d bloggers like this: