Oracle on ZFS

A first go at running a Oracle database on Sun’s ZFS.

Oracle on SPARC is a very common sight.  Most people who run SPARC run Solaris.  Sun Microsystem’s (now Oracle’s) SPARC and Solaris make for a well performing and stable platform for their Database Management System (DBMS).  Solaris 10, which has been out five years now, has brought into being many great features.  ZFS has been out for quite a while (not the initial release, however), and has also had tremendous growth in features, stability, and performance.  For years, I’ve resisted putting Oracle on ZFS.  The main reason is that ZFS is new.  Storage, above all, must be 100% correct.  ZFS is designed with this in mind, but despite the best efforts of the ZFS team, things do go wrong and recovery tools are not generally available.  Storage folks are a very conservative lot and “bleeding edge” makes many of us nervous.  The close-second reason is the Adaptive Replacement Cache (ARC) being stuck on.  The ability to bypass the ARC on a file system basis was introduced in Update 8 of Solaris 10.  The third reason is that ZFS is a Copy-On-Write file system.  That means every time a block is written, a new one is allocated and then linked in with the previous block becoming unlinked.  ZFS does this to ensure your write makes it to disk and that the file system metadata is always intact.  Fair enough, but I imagined a database fragmenting badly and quickly.  Knowing all of this, I decided it was time to put Oracle on ZFS (I’ll discuss the fragmentation in a minute) because it could help greatly in the problem I needed to solve.

The problem is this:  The organization uses an Oracle standby from the production database to create a reporting datamart.  They did this by stopping the standby, exporting custom tables from the reporting database, copying the standby database to the reporting database directories (over 300GB copied), modifying the reporting copy to be a reporting database, loading the tables back and starting standby again.  Wow.  The DBAs have their reasons for the database portion of it, but copying 300GB of data means that the SAN behind it (and the server!) has to move 600GB of data.  Oh, this was happening every night and took hours.

ZFS supports persistent snapshots and clones (writable snapshots).  As an added benefit, it also has privilege delegation.  With the ARC bypass firmly in place (along with block size changes), I was able to work with a DBA to put the standby database on a ZFS volume.  Instead of the four hour copy, an almost instant snapshot and clone is made.  The whole process (extraction of tables from reporting, snap, clone, transform, and load) now runs in less than 30 minutes and the DBAs have complete control over the process.

This setup has been in place now for several months bearing full load from users.  So far so good.  The SAN isn’t moving a mass amount of data every night, less storage is allocated, since two full copies are no longer needed and the system is using the CPU cycles to process other datamart activities.  If fragmentation becomes a problem, we can make a copy of the database, one file at a time.  After all, this is the way the organization used to do it.  Again, so far, no complaints have been made about the new setup.

ZFS is still evolving.  Features like de-duplication are coming.  I am hoping that block pointer rewrite becomes reality.  I’m sure many Solaris users will be happy to make the switch from the more traditional setup of Veritas Volume Manager and VxFS/UFS (aside from Oracle ASM, that is)..myself included.