Now that the transport has been established, the next decision is what storage system(s) to use to provide the data. For most, this decision has the most impact on performance. Controller performance, either in throughput or in Input/Output Operations Per Second (IOPS), number of spindles (physical disks), type of disk (FC, SATA, SAS, SSD, etc.), and the layout of those disks all come in to play. Most storage arrays support a mix of disk types, so ensure that the controllers can keep up with the IOPS load in the storage configuration you want to provide (probably the biggest “set in stone” decision to make). Some arrays are more flexible: you can add more controllers later on if you find out that the controllers are being stressed somewhere (cache, CPU, back-end IOPS, etc.).
The type of data you plan to store and the user expectation for how it will perform determine what disks are used in the array. SATA is great for low performance, high capacity storage. SATA does well with low IOPS and low contention (since seek times are generally higher than others and other features aren’t as advanced or are lacking) and so they serve archive or small workgroup storage well. SAS and FC do well for high IOPS loads (high-volume e-mail, OLTP databases, etc.) but tend to have a high cost.
Once you pick a disk-type for a particular work load, how do you choose a volume format? DBAs will always tell their SAN admin not to use RAID-5 because of the overhead to calculate parity for each write operation. Well, maybe. The reality comes down to user expectation. If your array will deliver your cruelest database loads in adequate time running on RAID-5, what’s the problem? If RAID-5 suffers, then RAID-1+0 it is. With database workloads, it’s often not as simple as that. Given the cost of SAN storage, making the most of it counts for a lot. If there is a particular part of the database that could benefit from faster storage, it may be beneficial to put things like redo logs on the fastest configuration you can configure and put the database on something a bit slower, but more space efficient, and the archive logs (depending on how long you want to keep them and how fast they get created) on something more cost efficient. Database systems like Oracle have excellent tools for determining what part of the database needs help. I can also assure you, DBAs have a knack for letting the storage folks know when they need something.
When it comes down to allocating the storage out to the hosts, I try to allocate what is needed plus a small overhead. Some operating systems are okay with the idea of growing a volume/LUN (Windows) and others are not (Solaris). For those that are not, I recommend the use of volume management software that will allow you to add additional LUNs and grow the file system (Solaris does allow for this, for example). Some storage arrays provide the notion of Thin Provisioning to work around this concept. Personally, I oppose the very idea. Thin Provisioning is a lie. The array tells the host the volume is bigger than the amount of actual storage backing it up. So long as the host doesn’t want to use all that storage, all is good. (Does this remind you a bit of some financial scandals?) If, however, the host decides that it needs that space, it is either allocated on demand or if there isn’t enough to be allocated, the write fails. A SCSI write failure is bad because it means your file system is now corrupt. The hope is that storage use will grow slowly enough that the SAN admins, who are hopefully paying attention, can add more storage to the array before the situation becomes critical. There are a lot of “ifs” in this whole concept. This whole thing relies upon the idea that 1. The SAN admin can add storage in a crunch (which means money from somewhere) 2. Everyone who is using the storage remembers that the numbers they see are lies and they don’t really have that much space, so plan accordingly and don’t do anything stupid, and finally 3. Nothing goes wrong. By the third point, I mean that some job runs away dropping core files or a database autoextends itself blissfully because no one put an upper cap on it, or someone isn’t told the numbers aren’t right and they use the space “temporarily”. Once TP space is allocated, it’s allocated. The storage array has no way to know that a block has been freed since “zeroing” blocks isn’t generally done (we’ll get to this same idea with SSDs in the next section). There is also an issue of resource consumption and human nature. When people know where the hard wall is, they’ll respect it (as do file systems in that they leave themselves enough room to work and not get corrupted). If people know that there is a limit out there somewhere (and who knows because there might be other TP users on the same array about which they have no idea), but not sure where, they might be tempted to gamble. In any case, TP can put the SAN admins in a tough spot. TP is sold by marketing folks as a storage saver (wait, you mean to tell me marketing people selling storage are trying to keep me from buying storage? Something’s not right. Oh, that’s right; they know if you lie to your users, that lie will catch up to you and you’ll come running to them at the last minute for more disks which will be a guaranteed sale!), but it is simply this: if SAN admins have to keep space free on the array to account for the possibility of TP growth vs. space free in file systems that goes unused, that space is still free (and wasted). Smart planning and flexibility from volume managers on the host side can mean that you don’t need TP and you’re not wasting a lot of space either. It will also help the SAN admins sleep better at night.
Now that we’re past all that, the last thing to remember with allocation is to have your application, be it a file system, DBMS, mail system, whatever, use the storage wisely. By that, I don’t just mean space; I really mean understanding what’s underneath. Issue read/writes in multiples of stripe widths (just the data piece, not parity, if any). There is no magic bullet for this (as it depends entirely on how the admin configured the volume) and controllers in arrays can make up for some of this. Make no mistake, a mis-alignment can cause a lot of undue stress on your storage. For example, it used to be the case that the default start of an EFI disk in Solaris is misaligned if the LUN came off hardware RAID. Ugh. Pay attention.
In conclusion, the decisions made come down cost, both in equipment and in skilled people or support contracts, and workload/application. There are examples of successful deployments for all the technologies mentioned in the previous posts. Personally, where cost is no issue, I’m biased towards Fibre Channel. That is not to say that I would pick it every time; I always try to choose the right tool, even though it may not be my favorite. For example, there’s no point in moving an ant hill with a bulldozer. If you want to create a low-load two node cluster, running a FC SAN is serious overkill because a couple of SCSI JBODs will do just fine.
Looking towards the future, the enterprise storage array is going to have some serious competition. There are lots of exciting changes in storage going on currently. There is the notion of object based storage (leaving behind the traditional general file system) and solid state disks (SSDs) maturing. Object based storage is a completely different concept, but SSDs are not. They’re still block-level storage devices, just faster. SSDs are starting to see adoption in storage systems already (Sun is doing this) and is being made available into the general OS (Sun also doing this with ZFS in Solaris/OpenSolaris). SSDs in ZFS are configured as a cache device, thus creating a hybrid pool of storage. It’s a fantastic use of the technology. When the SSD cost per GB reaches “good enough” and things like TRIM (the ability for the OS to mark a cell as unused so it can be erased — long story short, over time, SSDs themselves will lose track of what’s really used and what’s not which counts for true write size and wear-leveling) become universally implemented, storage may look a bit different. Nonetheless, the present notion of a SAN and storage arrays still have a lot of great applications and many organizations can benefit from them.