So, you’re thinking about a SAN… (part 3)

Now that the transport has been established, the next decision is what storage system(s) to use to provide the data.  For most, this decision has the most impact on performance.  Controller performance, either in throughput or in Input/Output Operations Per Second (IOPS), number of spindles (physical disks), type of disk (FC, SATA, SAS, SSD, etc.), and the layout of those disks all come in to play.  Most storage arrays support a mix of disk types, so ensure that the controllers can keep up with the IOPS load in the storage configuration you want to provide (probably the biggest “set in stone” decision to make).  Some arrays are more flexible: you can add more controllers later on if you find out that the controllers are being stressed somewhere (cache, CPU, back-end IOPS, etc.).

The type of data you plan to store and the user expectation for how it will perform determine what disks are used in the array.  SATA is great for low performance, high capacity storage.  SATA does well with low IOPS and low contention (since seek times are generally higher than others and other features aren’t as advanced or are lacking) and so they serve archive or small workgroup storage well.  SAS and FC do well for high IOPS loads (high-volume e-mail, OLTP databases, etc.) but tend to have a high cost.

Once you pick a disk-type for a particular work load, how do you choose a volume format?  DBAs will always tell their SAN admin not to use RAID-5 because of the overhead to calculate parity for each write operation.  Well, maybe.  The reality comes down to user expectation.  If your array will deliver your cruelest database loads in adequate time running on RAID-5, what’s the problem?  If RAID-5 suffers, then RAID-1+0 it is.  With database workloads, it’s often not as simple as that.  Given the cost of SAN storage, making the most of it counts for a lot.  If there is a particular part of the database that could benefit from faster storage, it may be beneficial to put things like redo logs on the fastest configuration you can configure and put the database on something a bit slower, but more space efficient, and the archive logs (depending on how long you want to keep them and how fast they get created) on something more cost efficient.  Database systems like Oracle have excellent tools for determining what part of the database needs help.  I can also assure you, DBAs have a knack for letting the storage folks know when they need something.

When it comes down to allocating the storage out to the hosts, I try to allocate what is needed plus a small overhead.  Some operating systems are okay with the idea of growing a volume/LUN (Windows) and others are not (Solaris).  For those that are not, I recommend the use of volume management software that will allow you to add additional LUNs and grow the file system (Solaris does allow for this, for example).  Some storage arrays provide the notion of Thin Provisioning to work around this concept.  Personally, I oppose the very idea.  Thin Provisioning is a lie.  The array tells the host the volume is bigger than the amount of actual storage backing it up.  So long as the host doesn’t want to use all that storage, all is good.  (Does this remind you a bit of some financial scandals?)  If, however, the host decides that it needs that space, it is either allocated on demand or if there isn’t enough to be allocated, the write fails.  A SCSI write failure is bad because it means your file system is now corrupt.  The hope is that storage use will grow slowly enough that the SAN admins, who are hopefully paying attention, can add more storage to the array before the situation becomes critical.  There are a lot of “ifs” in this whole concept.  This whole thing relies upon the idea that 1. The SAN admin can add storage in a crunch (which means money from somewhere) 2. Everyone who is using the storage remembers that the numbers they see are lies and they don’t really have that much space, so plan accordingly and don’t do anything stupid, and finally 3. Nothing goes wrong.  By the third point, I mean that some job runs away dropping core files or a database autoextends itself blissfully because no one put an upper cap on it, or someone isn’t told the numbers aren’t right and they use the space “temporarily”.  Once TP space is allocated, it’s allocated.  The storage array has no way to know that a block has been freed since “zeroing” blocks isn’t generally done (we’ll get to this same idea with SSDs in the next section).  There is also an issue of resource consumption and human nature.  When people know where the hard wall is, they’ll respect it (as do file systems in that they leave themselves enough room to work and not get corrupted).  If people know that there is a limit out there somewhere (and who knows because there might be other TP users on the same array about which they have no idea), but not sure where, they might be tempted to gamble.  In any case, TP can put the SAN admins in a tough spot.  TP is sold by marketing folks as a storage saver (wait, you mean to tell me marketing people selling storage are trying to keep me from buying storage?  Something’s not right.  Oh, that’s right; they know if you lie to your users, that lie will catch up to you and you’ll come running to them at the last minute for more disks which will be a guaranteed sale!), but it is simply this: if SAN admins have to keep space free on the array to account for the possibility of TP growth vs. space free in file systems that goes unused, that space is still free (and wasted).  Smart planning and flexibility from volume managers on the host side can mean that you don’t need TP and you’re not wasting a lot of space either.  It will also help the SAN admins sleep better at night.

Now that we’re past all that, the last thing to remember with allocation is to have your application, be it a file system, DBMS, mail system, whatever, use the storage wisely.  By that, I don’t just mean space; I really mean understanding what’s underneath.  Issue read/writes in multiples of stripe widths (just the data piece, not parity, if any).  There is no magic bullet for this (as it depends entirely on how the admin configured the volume) and controllers in arrays can make up for some of this.  Make no mistake, a mis-alignment can cause a lot of undue stress on your storage.  For example, it used to be the case that the default start of an EFI disk in Solaris is misaligned if the LUN came off hardware RAID.  Ugh.  Pay attention.

In conclusion, the decisions made come down cost, both in equipment and in skilled people or support contracts, and workload/application.  There are examples of successful deployments for all the technologies mentioned in the previous posts.  Personally, where cost is no issue, I’m biased towards Fibre Channel.  That is not to say that I would pick it every time; I always try to choose the right tool, even though it may not be my favorite.  For example, there’s no point in moving an ant hill with a bulldozer.  If you want to create a low-load two node cluster, running a FC SAN is serious overkill because a couple of SCSI JBODs will do just fine.

Looking towards the future, the enterprise storage array is going to have some serious competition.  There are lots of exciting changes in storage going on currently.  There is the notion of object based storage (leaving behind the traditional general file system) and solid state disks (SSDs) maturing.  Object based storage is a completely different concept, but SSDs are not.  They’re still block-level storage devices, just faster.  SSDs are starting to see adoption in storage systems already (Sun is doing this) and is being made available into the general OS (Sun also doing this with ZFS in Solaris/OpenSolaris).  SSDs in ZFS are configured as a cache device, thus creating a hybrid pool of storage.  It’s a fantastic use of the technology.  When the SSD cost per GB reaches “good enough” and things like TRIM (the ability for the OS to mark a cell as unused so it can be erased — long story short, over time, SSDs themselves will lose track of what’s really used and what’s not which counts for true write size and wear-leveling) become universally implemented, storage may look a bit different.  Nonetheless, the present notion of a SAN and storage arrays still have a lot of great applications and many organizations can benefit from them.

Author: Barry Freese

Topic(s): Comparison

Published: November 8, 2009 21:07





So, you’re thinking about a SAN… (part 2)

Once you’ve decided that networked storage is the way to go, which option do you choose?  There are pros and cons to all of them, and picking the right tool for the job can make a world of difference.  Of course, that assumes money is no object.  For small organizations, especially in this economy, making do with what you have might be the only option.

NAS is a good choice for problems that span large areas, need to be shared, and where limited flexibility is acceptable.  For example: a college that needs a system to share out home directories is a great use for a NAS.  Students, faculty, and staff all need access to their files across the public network, and just about every OS on a desktop can use CIFS, NFS, or both.  NAS can also be a great option for low-cost shared storage or storage that needs to be provided over an unstable network.  The reason I say low-cost is that a NAS system can be a “roll your own” setup.  For example: Sun Solaris 10, Sun Cluster, and JBOD trays that support multiple initiators.  Highly available NFS and CIFS (via Samba) is ready to go.  The same can be constructed with Linux.  NFS systems can also be paired to backup systems through NDMP and spare the backup server some work.  NAS systems are also available from commercial vendors as a great option for those looking for something more turn-key.

That said, NAS has limitations.  You can’t be as creative with NAS as with block-level SANs.  A block level storage device coupled with basic volume management software in the host OS provide a fair amount of flexibility.  Example: When migrating from one disk array to another (the old array reached end-of-service-life), a new disk array was brought on-line, added to the hosts, storage mirrored using the host-side volume management, old storage detached, old array scrubbed, and old array shut down and rolled out the door.  The impact to the users of the ERP and E-mail system running on the old array?  Nada.  No one noticed.  Again, SANs provide block level storage.  You can build whatever file system you want, use Oracle ASM, or whatever else can make use of it; you’re not tied to a protocol or specific set of implemented features of a protocol (“Is that NFSv3 or NFSv4?”).

When considering the networking for block-level storage devices, keep in mind that generally block level storage  does not fail gracefully.  Storage, above all, has to be correct 100% of the time, so having a reliable network between host and storage is critical.  Performance is second to correctness.  There are lots of options for block level networking and vary greatly both in cost and performance.  Some of the choice here depends on the storage array and what it can provide.  For arrays that support ISCSI (which is a means of moving the SCSI protocol over IP networks), this can be a cheap block level network setup.  Solaris has an ISCSI target capability built-in, and that married to ZFS is great to get your feet wet with ISCSI (but there’s no redundancy, so I wouldn’t trust it for a production system).  Like NAS, this option can be implemented over your existing IP network (although I don’t recommend it) and cheaply since many operating systems have software ISCSI initiators.  There are hardware HBAs available too.  However, even with a dedicated back end network, it’s still an IP network which has shortcomings when it comes to storage.

Fibre Channel is a very widely deployed answer to the block level network problem.  This option is not exactly cheap.  Fiber Channel is a network, first and foremost, that was designed with storage in mind.  Concepts like flow control, “routing”, and loop detection are different.  Getting into Fiber Channel can be a bit pricey in that FC requires special switches and HBAs in the servers (not to mention cabling).  FC is usually implemented in duplicate (creating two different, disconnected networks called fabrics) so under certain conditions or even a fabric failure is a non-event to the servers.  FC also comes in a variety of speeds and, so far, has maintained backwards compatibility.  FC comes in 1Gb/s, 2Gb/s, 4Gb/s, and 8Gb/s.  Ethernet comes in 1Gb and 10Gb.  In my experience, 10Gb/s to the host is too much and 1Gb/s is too little.  Sure, things like port channels can work around some of this, but single link speed flexibility is a nice thing to have.  There is a lot of intelligence built into FC in the switches which provides the true beauty of FC.  One bit of wisdom with FC switches: don’t mix switch vendors.  Even switch vendors like McData and Brocade (Brocade bought McData several years ago) who have developed firmware to make each switch platform (M-EOS and FOS) play nicely with each other, still only have a limited set of features in compatibility mode.  Homogeneous fabrics make life easier and allow you to take advantage of features that would otherwise be disabled or limited.

Another option for storage networking is very new and being pushed by Cisco and Brocade:  the idea, called Data Center Ethernet, merges Ethernet and FC into one network. .  It’s a neat idea…on paper.  The advantage for organizations that deploy large FC networks is reducing cabling and switching costs because both types of data move over 10Gb/s Ethernet links.  The downside is that it requires special switches (you still have to have a fabric controller!) and special HBAs.  Unlike with FC, you can’t use your older, slower FC HBAs as part of a rolling upgrade (well, sort of), and the Ethernet port in your server won’t cut it either.  To be more clear, you can combine the DCE and FC systems together, but in order to move to DCE completely, you’ll need to buy all new HBAs.  I cannot say that I have any first hand experience with this technology.  It has been proposed to me by sales folks, and aside from the cost of adoption, it’s also a very new technology.  Going back to one of my first points, storage has to be right.  Bleeding edge and storage is a match I try to avoid.  I was also told that DCE is the future, to which my reply was “I was told five years ago that FC was dead.”  I assure you, FC isn’t dead.  Storage folks, in order to “keep it right”, like to keep it simple.  It’s nice to have a separate set of switches that do one thing and one thing well.  Cisco and Brocade may be on to something, but SAN architects are a very conservative lot so I suspect adoption will be slow.

There are other options from the “experimental” (like ATAoE) to the niche (InfiniBand with Lustre) .  NAS with either NFS or CIFS, ISCSI, and FC seem to be the leaders thus far in terms of install base.

Now that you’ve decided on what transport will meet your needs, our next installment will help you figure out what should dish out your data.

Author: Barry Freese

Topic(s): Comparison

Published: November 1, 2009 21:14





So, you’re thinking about a SAN… (part 1)

In most situations I’ve seen, moving to enterprise storage comes about as an evolutionary development.  This begins with systems administrators, who have lived with Just a Bunch of Disks (JBODs) scattered across a variety of servers, looking to change how their storage works.  How does an organization decide to move into centralized storage, especially today’s world of cloud computing?

What can a Storage Area Network (SAN) do for me?

  • Storage is usually the slowest part of any computer system, big or small.  Getting storage right can improve performance in a variety of applications.  Using a storage array allows servers to get the benefit of many spindles when they need it, and also allows intelligent controllers to be shared by multiple servers.
  • Speaking of multiple servers, storage arrays almost always support multi-host initiated virtual disks.  While this can be done with JBODs, the scale that is possible doesn’t compare.  Creating shared storage for clusters is easy.
  • One-stop-shop for data replication.  There are many data replication options out there.  Some applications will replicate their own data, some volume managers or file systems will do it.  Both of these options have big ifs and aren’t universal.  Having a storage array, or an intermediate device, replicate the data means the OS, file system, application, etc. doesn’t matter; the same replication strategy works for Windows, Linux, Solaris, AIX, and any other OS you can hook to the array.
  • Less waste.  You can allocate the storage you need where you need it and that’s it.
  • Just like with JBODs, you still have your hands on your data.

“If you build it…” but how?

There are many different options for building the storage communication network (or utilizing an existing network) for sharing storage.  In our installment, we’ll examine how to choose between a Network Attached Storage (NAS) solution (NFS or CIFS), or block-level SAN (ISCSI, Fibre Channel, Fibre Channel over Ethernet, etc.) .

Author: Barry Freese

Topic(s): Comparison

Published: October 28, 2009 21:43