Listen, we need to talk

SAN Administrators and DBAs… Developers and DBAs… Project Managers and DBAs… The… the… the business users and DBAs… Oh, my… It seems like everywhere we look there is some group that us DBA types seem to have a problem with. I know we all don’t… But it’s one of those relationship issues we are -supposed- to have.. You know, like mother-in-laws? There’s a saying out there that goes something like, “If you seem to have a problem with everyone else – maybe everyone else isn’t the problem.”

I think we suffer from that as DBAs. Sometimes we are the problem, even when we aren’t. Why? Because we don’t chat, we don’t

Clint Eastwood Talks to Your Developers
I’m sorry, developer, I can’t do that to myself…
explain things, we just expect perfection out of everyone. We are normally understaffed when compared to the numbers of developers, project managers, users, etc. who depend on us. We’re often understaffed compared to the legions of IT operational staff, too.  I’ve been there. I see it all the time with my clients.

Part of the solution is expressing ourselves appropriately and effectively. Let’s chit chat about that here. I’ll start off by saying I am not the one stop answer shop here and I would love to see your thoughts on how we get this wrong in the comments. Please – discuss. I’ll get the conversation going.

Improper Communication Leads To

We can choose to just talk to the other DBAs and our management and whine about the lack of understanding we see. We can vent to friends at User Group meetings or the SQL PASS Summit. That won’t solve anything. Improper communication has led to a lot of bad in the world. At NASA and some of it’s contractors you could say it was at least part of what led to the Challenger and Columbia tragedies – avoidable tragedies based on info teams had before either. We could blame a fair number of our armed conflicts on this as well as some of the tragic way certain conflicts were handled when politicians stuck in Washington failed to effectively communicate with the military tacticians in the field. It has been the cause of several avoidable aviation disasters – So much so that a simple – but effective – training process, Crew Resource Management, was created in the aviation industry..  And in our workplaces – it leads to downtime, increased IT expenditures, project delays, and tension. I’ve witnessed the effects of this failure with my clients repeatedly… One example –

Just recently I was at a shop that told me their disk layout was rated for high performance and would scale fine so we wouldn’t really need to test it like I wanted. I asked them how they knew this. Their response was basically, “Oh.. We told the off-site SAN team that this was to be high performance and we cared most about performance for the logs and the tempdb LUNs.” Sounded good but I asked if we could have a call with the SAN team to get a few things answered from them about how their particular SAN was laid out. We had that call to ask about SAN cache, RAID types, LUN layout, etc. During the call I asked how the LUNs for this particular cluster were setup… The answer back was, “Oh.. they are all slices off of different 5 disk RAID 5 groups” I asked what else was on them and they said not much yet but they were the last RAID groups available and would get the most allocations since the SAN was full until/unless more trays were purchased. I also said, “now the team here said they told you that the LUNs were all performance sensitive and that the TempDB and Log file LUNs needed to be high performance, those are also on  slices of 4+1 RAID 5 raid groups?”  The answer back was, “Yeah… That’s how we do all drives unless the team specifically asks for a specific RAID type or different setup… Everyone says their stuff is ‘High Performance’ but if all they tell us is the size, they get a slice off of a RAID 5 for the space they need.” I thanked them for their time and then had a conversation about communication with the SAN team with the local DBA team….

If we use this as our example and add in a little basics from crew resource management we can come up with at least a few simple rules –

Effective Communication…

  1. Doesn’t leave room for assumption – The DBA team assumed High Performance meant dedicated RAID 10 in this case.. It didn’t mean anything, actually. Airliners have literally crashed because the crew each assumed the other was watching fuel or that fuel was fine..
  2. Is Precise – “High Performance” is a bad request… It will almost always mean something different to the asker and to the asked… Spell out what you need. When you are complaining to a developer instead of saying (pardon me..), “your code sucks, I won’t implement this” – try and precisely tell them what’s wrong and how they can improve it.. They’ll learn something and you’ll have an ally in development as a result.
  3. Seeks clarification – “High Performance” may be a bad request. But to see that and then just deliver the standard is intellectually dishonest and, well, lazy. If someone asks for something and there is room for assumption – clarify it. I talk about this in my “If you see something, say something” post.
  4. Is Free of Attitude – I know… Developers are supposed to hate DBAs and DBAs are supposed to be grumpy to developers.. It ought not be that way, though. Listen, understand where the other person is coming from and help them out.
  5. Is Concise – Yeah…… I’m up to 1,000 words already. I’ve been doing this for 13 years and I’m still working on it. It helps though. If your request is lost in a novel you won’t see a response. Especially in this age. Be concise, Mike…
  6. Seeks Confirmation – In crew resource management there are a few steps to communication.. The process is something to the effect of 1.) Get their attention, 2.) State your concern/intention 3.) Explain the problem/reason from your view, 4.) Propose a solution or approach and 5.) Seek agreement/confirmation…  Close the deal! “Hey SAN Team – these drives are housing our critical app that will have a lot of transactions and runs our entire operation.. Performance is going to be visible and important here, I think we should go dedicated RAID 10 with 6 disks for TempDB and a dedicated mirror for logs because this is what we’re expecting for load. Do you have the spindles for that and does this approach make sense to you?” This is clear, concise and gives them an opportunity to counter, seek clarification, disagree or agree. If they agree and don’t deliver you now have something to fall back on and you figured out about the issues up front instead of the week of go live deployment.

Speaking of being concise, I’ll stop here. What are your tips and tricks to avoid “What we got here… Is a failure to communicate?” syndrome? How do you get along with your SAN team? What irks you about the way your DBAs talk to you? What could politicians learn from us? Share your thoughts in the comments.

7 thoughts on “Listen, we need to talk”

  1. I’ve learned over time to stop trying to specify what type of RAID I want and how many drives I want to a SAN team, since they almost always have a standard way of doing things and what I am asking for I’m not going to get. Instead, I have boiled it down to the amount of Read and Write I/O per second (IOPS) as well as Read and Write Bytes/Sec I need the SAN to be able to provide for each of the files that I plan to place (user DB, tempDB, logs). I gathered that information from 5 or 6 of our biggest customers, multiplied it by 3 and gave them those numbers. Problem solved. SAN guys seem to like that approach. Hey, as long as I get the performance I need out of the disk, who really cares what type of RAID it is anyway? Using RAID 10 will get you better performance, but there’s always more than one way to solve the problem. I strive for best practice, but have to live in reality and make it work even when best practice isn’t followed.

    Reply
    • Great Point Mindy – And actually a better approach especially with the newer storage units out today. In a lot of shops, gone are the days of having specific spindle and Raid Group discussions and it is more about quantifying and classifying your performance needs. This is part of that communication I am talking here. This particular customer still talked a lot about RAID and Raid Groups and spindles and keeping things separate. They were on a Clariion and had a SAN team that cared a lot about spindles and Raid Groups and who was on what. In a lot of shops it is much more a conversation about the expected workload, the expected pattern of IO (more sequential reads, more random writes, what size the writes are, etc.) and the desired throughput and IOPs. But saying “high performance” doesn’t cut it – unless you have that conversation ahead of time and set up tiers and you both know what you mean when you say “this server is in the high performance tier”

      Reply
  2. I would go even a step further with this post and suggest that this guidance should apply to communication with other DBAs, even those in your team. Yes, DBAs tend to talk to each other more than they talk to other teams, but there is just as much danger of miscommunications when speaking with other DBAs as well. Really, the guidance you suggest is useful for communication with anyone where one of you wants something from someone else.

    Reply
    • Absolutely Mike – Great point. In the case of the aviation industry the miscommunications typically were among folks on the “same team” and in the same cubicle – as most of the communication issues were pilot to pilot issues in the cockpit.

      Reply
  3. I would say that telling your SAN admin/team that you need dedicated RAID 10 LUNs is the wrong way to request what you need.

    People seem to equate RAID level with being the deciding factor of whether or not your disk system is high performance. There are lots of factors that need to be considered. In most SAN-attached systems today, I see the systems bottlenecking on throughput long before they even get close to the limits of the disks themselves.

    When I talk to a SAN admin/team, I talk in terms of IOPS. I need to get xxx IOPS for this application. I don’t care what they tweak to reach that performance metric. I don’t care if it’s RAID 5 as long as it delivers the performance I require.

    Reply
    • Robert – Again, as I replied to Mindy – this was probably not the best example to use. Don’t get lost in the RAID details so much as the perception the team had when they said one thing “High Performance” and the delivery the got “same as everyone else” , with no real look towards IOPS or throughput. You are right, as long as the performance is fine then who cares about the RAID level nowadays, especially in the days where tiered storage and thin provisioning rule the day in most data centers… In this case, no thought at all was given to IOPS or throughput or latency by the team doing the requesting or the team doing the allocating..

      Reply
  4. #4 under letter ‘F’ is key! If possible, never raise a problem without proposing a thought-out solution as well. This prevents the classic ‘lobbing it over the fence’ that always seems to happen between DBAs and Development.

    Approach people with humility, an open mind and an open heart and your words will travel much further.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This