It is human nature to draw from experiences to make sense of our surroundings. This holds true in life and performance tuning. A veteran systems administrator will typically tune a system different from an Oracle DBA. This is fine, but often what is obvious to one, is not to the other. It is sometimes necessary to take a step back to tune from another perspective.
I recently have ran across a few cases where a customer was tuning “Sorts” in the database by adding memory. Regardless of your prospective, every one knows memory is faster than disk; and the goal of any good tuner is to use as much in memory as possible. So, when it was noticed by the systems administrator that the “TEMP” disks for Oracle were doing a tremendous amount of IO, the answer was obvious right?
RamDisk to the rescue
To solve this problem, the savvy systems administrator added a RAM disk to the database. Since, it was only for “TEMP” space this is seemed reasonable.
ramdiskadm -a oratmp1 1024m /dev/ramdisk/oratmp1
Indeed user performance was improved. There are some minor issues around recovery upon system reboot or failure that are annoying, but easily addressed with startup scripts. So, SLA’s were met and everyone was happy. And so things were fine for a few years.
Double the HW means double the performance… right?
Fast forward a few years in the future. The system was upgraded to keep up with demand by doubling the amount of memory and CPU resources. Everything should be faster right? Well not so fast. This action increased the NUMA ratio of the machine. After doubling memory and CPU the average user response time doubled from ~1 second to 2 seconds. Needless to say, this was not going to fly. Escalations were mounted and the pressure to resolve this problem reached a boiling point. The Solaris support team was contacted by the systems administrator. Some of the best kernel engineers in the business began to dig into the problem. Searching for ways to make the “ramdisk” respond faster in the face of an increased NUMA ratio.
A fresh set of eyes
Since I have worked with the Solaris support engineers on anything Oracle performance related for many years, they asked me to take a look. I took a peak at the system and noticed the ramdisk in use for TEMP. To me this seemed odd, but I continued to look at SQL performance. Things became clear once I saw the “sort_area_size” was default.
It turns out, Oracle was attempting to do in-memory sorts, but with the default settings all users were spilling out to temp. With 100’s of users on the system, this became a problem real fast. I had the customer increase the sort_area_size until the sorts occurred in memory with out the extra added over head of spilling out to disk (albit fast disk). With this slight adjustment, the average user response time was better than it had ever been.
- Memory is memory, but how you use it makes all the difference.
- It never hurts to broaden your perspective and get a second opinion