We wrote an article a while back for one of our favourite technical forums.
Here’s a reposting of the article with a few edits.
“Our performance is as flat as a pancake – we need more storage!”
It’s a common cry. Frequently it is also preceded by a common timeline that looks something like this:
24 months ago: your SAN had 10 disks and it was slow.
18 months ago: you upgraded to 20 disks and told management the performance boost would last for 3 years.
Fast forward to today, and you’re on the phone to your VAR saying “The users are screaming about slow performance – we need to upgrade the SAN to 40 disks!”
Your friendly VAR or reseller won’t argue, and the conversation flows something like this: “How many Terabytes do you need?” and “when do you want it by?” along with mutterings about a discount if you can raise a Purchase Order before the end of the month. What your VAR probably didn’t ask are questions like “How many IOPs do you need?” and “Are your IOPs random or sequential?” and “What applications are consuming the most IOPs?” The typical meaning of an IOP being the number of Inputs and Outputs against a particular traffic pattern, but they probably won’t get into that either (we’ll get into it more later in this article).
If you find yourself in this situation – Just Stop. (Unless you are working for a company where Revenues, Profits and Staff size have also doubled and tripled in the last 18-24 months – then there might be good reasons for needing more capacity and IOPs and probably a sizeable IT budget to go with it).
It’s been my experience that VARs rarely ask you about IOPs for two reasons:
It’s not an area they are entirely comfortable with – at the end of the day, it’s your environment that’s generating the IOPs, they’re just supplying the goods.
Even if they were to ask, very few customers have taken the time to fully analyse their IOP requirements, tending to use the method of adding 50% more disk than they had last time – so even if the reseller did want to engage in an IOP conversation, there might not be the background knowledge to progress the conversation.
The key rule is: You should try to buy storage first by considering IOPs, and then let the space and capacity requirements drop out of the IOP consideration (though you may need a few iterations to balance cost, capacity power, rack space etc).
How do you do this?
Storage, in most general commercial businesses, supports a somewhat standard mix of email, databases, file servers, and the usual collection of infrastructure servers (DC’s etc). The number of IOPs you will obtain from a given fixed Disk or SAN configuration depends on 3 main parameters:
The size of the ‘Traffic Packet” each application is requesting/writing from/to the storage (typically 4KB for Microsoft Exchange, 16KB for SQL)
Whether further Application IO requests are going to be from a neighbouring part of the disk (known as Sequential access) or some other far flung part of the disk platter (think of this as Random access).
Whether we are Reading from the disk or Writing to the disk.
(There are a few other factors, such as the number of application transactions occurring simultaneously).
So back to “what is an IOP,” unless someone is especially referring to “sequential IOPs” or 64KB IOPs etc., the typical meaning of an IOP is the number of Inputs and Outputs against the following traffic pattern:
- 4KB Traffic Packet – typically representing Microsoft Exchange of old
- 100% Random traffic – representing numerous requests from numerous applications, causing the disk heads to move randomly to any location on the disk
- 70% Read, 30% Write activity — As most businesses read more data than they create
(If you’re doing video editing or 99% of your business is just files, then the above wont apply.)
Here’s the next rough rule of thumb:
- A modern 7.2K RPM SATA Disk will deliver 80-90 IOPs, as a stand alone disk
- A modern 15K RPM Disk will deliver 170-180 IOPs, as a stand alone disk
- RAID 0, 1, 10 OR 5 will have no effect on Read performance (Read performance is mostly an effect of the number of disks you have and the rotational speed of those Disks)
- RAID 10 will halve your write performance
- RAID 5 (for simplicity) will quarter your write performance (yes it is far more complicated than that – but if you are currently not considering any form of IOP calculation in your storage consideration, then at least this rough rule of thumb will be a starting point)
- RAID 0 we won’t consider as it offers no resiliency from a disk failure, and RAID 6 is a topic for another day
Let’s look at some typical applications:
- Exchange 2003:
- A Light user – (emails: 20 sent/50 received per day) might generate 0.5 IOPs
- An Average use – (emails: 30 sent/75 received per day) might generate 0.75 IOPs
- A Heavy user – (emails: 40 sent/100 received per day) might generate 1 IOP’s
- A Large user –(emails: 60 sent/150 received) might generate 1.5 IOP’s
These are definitions Microsoft use, and can be found across their web sites.
If we throw in Blackberry Exchange Server (5.5) – then we should multiply the above figures by a factor of 3 for those specific users carrying a Blackberry.
Around the web are dotted various guides that can help you understand SQL and file Server IOPs, but it is possible to guesstimate these figures by understanding how many SQL requests are typically made, or how many user interactions per second go into your SQL applications, and how many SQL read/write requests each interaction might generate.
The main point is to build up a feel for your IOPs from an end user perspective – which departments in your business are the greatest consumers of Storage IOPs? Which applications consume the most IOP’s? (and even for those applications – which elements are the IOP hogs? for example LOG files typically consume significant IOPs, but an applications associated Database might be lightly hit.)
So, back to the need to grow from 10 disk to 20 disk and then 30 disks, and along the way, potentially facing an upgrade your SAN.
One organisation I interact with has been through the SAN upgrade process 3 times over 4 years, ending up with an expensive SAN with 12 or so 1TB SATA disks, and 36 high speed 15K RPM SAS disks. In total, that’s about 7,000 IOPs of storage processing power.
Yet, no matter which way their applications were sliced and diced, no more than 2,000 IOPs of storage processing was required, with some limited spikes to 3,000 IOPs.
It turns out:
- Their iSCSI network and switches had been incorrectly setup with iSCSI traffic passing through a stacked switch arrangement with limited bandwidth between the switches
- The RAID configurations were mainly RAID 5, and databases and logs had been placed on the same RAID 5 volumes, rather than separating busy log file and placing them on faster RAID 10 volumes.
- VMware had given them untold flexibility, but had allowed them to move their applications and data with such ease that data was residing in areas of their storage where it was entirely unsuited.
Eventually, this configuration was optimised, and they now run off 16 SATA drives – about 1300 IOPs, with a carefully placed 200GB of SSD taking the high IOP strain of a few log files from Exchange and their key SQL server.
This was a classic case of 50% of the IOPs being generated by 5% of the physical data size. This is in contrast to the 50 or so SAS drives they believe they needed.
Their savings have been considerable, but more importantly, they have RESET or RELEVELED their baseline understanding of their IOP needs and broken the cycle of designing their next Storage upgrade by the “old method” of just adding 50-100% more disk in a SAN than they had last time.
Is all of the above entirely accurate? Not to a level of infinite detail, there are dozens of parameters that can flex and flow, and in the good old days pre-Lehmans, there often wasn’t the time to work this “stuff” out – and it was easier to throw money at the SAN.
Today, the businesses I deal with are asking themselves whether they have got their storage right – and they now have the time to sit down and think about their application/storage ‘profile.’ In the end, they get to keep more of their IT budget, which goes a long way to buying more pancakes rather than unnecessary IOPs.