After talking with customers over the years, I find that performance is necessary… but predictability is what really matters. Today’s data centers are complex. Applications are very dynamic changing often on a daily basis and new applications are being added to the mix as well. The constant application morphing is enough to scare most DBA’s. But, the scariest thing of all is, the resource requirements are often not well understood… chaos indeed.
the dreaded 1:00AM call
How many DBAs and system admins have got the dreaded 1:00AM call only to track down the problem to be “application” update that was applied just prior? So, the patch added some “new” functionality or added some additional batch processing. This typically amounts to an increase in IOPS and unless you are well provisioned, application response time will typically increase as well. How do you provide predictable user response time in this environment?
response time with latency-sensitive applications
I was just reading through Kevin’s most recent post on the Sun Oracle database machine. He does a great job of describing the subtleties of a consolidated environment, and the benefit that Sun FlashFire technology provides. I particularly like the following quote:
What if you have a database that doesn’t require extreme IOPS but requires very low latency I/O served from a data set of, say, 1 TB. Imagine further that this latency-sensitive database isn’t the only database you have.
In typical ERP/CRM environments, end-user response time is often directly tied to IO response time. If you look at the recent Sun/Oracle TPC-C announcements with FlashFire, you will get an idea of what I mean. This Sun/Oracle TPC-C result showed a 10x drop in response time over the next highest result. TPC-C is very sensitive to IO latency, so it is no wonder that Flash was a win in this environment.
For another example, consider this AWR excerpt from a customer application…
Reads CPU Elapsed Physical Reads Executions per Exec %Total Time (s) Time (s) SQL Id -------------- ----------- ------------- ------ -------- --------- ------------- 667,627 16,020 41.7 45.3 2456.29 20703.06 ggyyzqn066f38 Module: java@myhost (TNS V1-V3) BEGIN PKGCUST.SHOW_CUST (:1, :2); END;
The above query averages 41.7 reads with a query response time of 1.3 seconds. The above example has an average IO latency of 7ms, so 41.7 reads would take about 292ms. But these are 15,000 rpm drives and at 7ms response time, these disks are getting pretty hot. This environment experienced a 4x increase in IOPS due to new functionality and some rogue applications. This increase caused the query response time to jump from 1.3 -> 2.2 seconds.
Now, if this query was running on Exadata V2, things would look different. This sudden spike in IOPS would not affect the IO response time. With Exadata V2, the 41.7 reads would happen in about 21ms and this would not change with an 4x increase in IOPS. We have measured the response time and it remains flat at about 0.5 ms whether you are doing 100 IOPS or a Million.
sleep well at night with FlashCache
Exadata V2 solves the problem of predictable IO latency by providing more IOPS than applications can actually use. If a workload increases or new functionality is added, you don’t have to worry about IO latency. I like the way the Kevin put’s it in his blog post:
No, I’m not as excited about the IOPS capability of the Database Machine as much as the FLASH storage-provisioning model it offers. In spite of all the hype, the Database Machine IOPS story is every bit as much a story of elegance as it is brute force.
FlashCache puts the IOPS where they are needed most. If you run multiple instances in a dynamic environment, Exadata V2 with FlashCache might just help you avoid the dreaded 1:00AM call.