Open Storage S7000 with Exadata… a good fit ETL/ELT operations.

I have worked on Exadata V2 performance projects with Kevin Closson for nearly a year now and have had the opportunity to evaluate several methods of loading data into a data warehouse. The most common, and by far the fastest method, involves the use of “External Tables”. External tables allow the user to define a table object made up of text files that live on a file system.   Using External Tables allows for standard SQL parallel query operations to be used to load data into permanent database tables.

SQL> alter session enable parallel dml ;
SQL> insert /*+ APPEND */ into mytable select * from ext_tab ;

With the size and power of Exadata, businesses are creating larger and larger data warehouses. There will often be dozens of machines that collect and stage data for ingest by the data warehouse.  So this means the staging area for these flat-files must be huge, real fast, and accessible from multiple networks.

What options are available for staging input files?

With Exadata V2, or any RAC environment, flat-file data has to be present on all nodes in order to fully utilize parallel query.  The natural first choice with Exadata V2 is to use DBFS.

DBFS comes with Exadata and allows for easy clustering across all Exadata database nodes.  The real data store for DBFS are database tables residing on a tablespace within the database machine.  The DBFS client program is then used to mount the DBFS filesystem such that they appear to the Linux user to be just another file system.  This allows for file system data to be managed just like any other database while using the full power of Exadata.  DBFS is quite fast and works well for housing external tables, but it does cut down on the storage available for the data warehouse.  Also, since DBFS is simply a client on an Exadata database node, it uses CPU resources on the database machine to initially transfer or create the flat files.

Open Storage S7000 a natural staging area for Exadata

If you want to extend the amount of storage to stage data for your warehouse, then the S7000 is an excellent choice.  The S7000 can stage files off traditional networks using 1gigE and 10gigE connections.  This allows for multiple machines to seamlessly connect to the S7000 in-order to stage data for ingest.  This activity has no effect on the Exadata users since the S7000 is a self contained storage server – unlike DBFS that uses CPU cycles from the database grid to manage and store the flat-file data.

Once the data is on the S7000, we can use IPoIB and connect directly into the high-bandwidth Infiniband network that is part of Exadata V2.  This allows the S7000 to be positioned neatly between Exadata and the traditional gigE networks.

what about performance?

As part of a larger project, I was able to run a quick test.  I had the following:

  • S7410 with 12 drives
  • 128 x 1GB files on a share
  • 8 db nodes active (x4170) with the share mounted on all nodes.

I created an external table across all the files and performed two tests:

  1. Select count(*).
    SQL> select count(*) from ext_tab;
  2. Insert as Select “IAS”
    SQL> alter session enable parallel dml ;
    SQL> insert /*+APPEND */ into mytable select * from ext_tab;

Both when querying and loading data with “IAS”, I was able to get 1.2GB/sec throughput as I saw with my earlier tests with S7000 and 10gigE.  That is over 4TB/hr with just one head node for the S7410.  With a clustered configuration and multiple mount points, the load rate could be increased even further.

summary

The Sun Open Storage S7410 server is an excellent choice for managing file system data.  With the ability to connect to multiple networks, it is a perfect fit to stage data for Exadata environments as well.

About these ads

5 Responses to “Open Storage S7000 with Exadata… a good fit ETL/ELT operations.”


  1. 1 Ben Prusinski June 9, 2010 at 11:57 pm

    Incredible performance 4TB per hour!

  2. 2 Jeff Wasilko June 10, 2010 at 8:30 pm

    We use the 7000 with our RAC clusters (sparc and Exadata V2) for archivelog storage. The built-in compression gets us over 4x space savings without having to run gzip/bzip and also allows us to easily replicate the logs to our DR datacenter. Great product!

  3. 3 Luis Campos March 6, 2011 at 3:36 am

    Jeff:
    Do you use storage sync for Archive Logs for Exadata?

    Thanks,

    LMC

    • 4 glennfawcett May 4, 2011 at 6:25 am

      Archive logs for Exadata are usually housed within the Exadata frame in the FRA area. I have not placed archive logs outside of the frame, but you would definitely need to take extra care to ensure availability.


  1. 1 Tweets that mention Open Storage S7000 with Exadata… a good fit ETL/ELT operations. « Glenn Fawcett's Oracle blog -- Topsy.com Trackback on June 9, 2010 at 12:00 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





Follow

Get every new post delivered to your Inbox.

Join 316 other followers

%d bloggers like this: