SQL Server HA/DR Webinar

on November 29, 2022

On Tuesday, 12/13/22, we presented a webinar about HA/DR (high availability and disaster recovery). Join Team Straight Path as we discuss what HA is, what DR is, and how they are different. We talked about the ways to achieve HA and DR. We’ll talk about the things we’ve learned about HA and DR along the way and pass those tips on to you.

This topic is part of our monthly SQL Server webinar series. Webinar registrants get access to the entire recording, which includes the office hours Q&A. If you would like to be notified about upcoming webinars, be sure to subscribe below:l. But the Office Hours and Q&A will only be available for webinar registrants. It’s free to join us; register here:

Sign Up for Updates

HA/DR Panel Discussion

Transcript

00:00:09:13 – 00:00:30:24
Mike Walsh
Well, welcome to the Straight Pass Solutions kind of monthly webinar we’re doing on a different topic. And if you’ve been watching us so far, we’ve been choosing kind of a theme each month to talk about. So last month we talked about upgrades and migrations. This month we talked about HADR next month actually. Jeff, you want to talk what we’re talking about next month, I’ll put you on the spot.

00:00:30:24 – 00:00:31:13
Mike Walsh
You’re on mute.

00:00:32:15 – 00:00:35:09
Jeff Iannucci
On mute myself. First we’re talking about our use of community tools.

00:00:36:11 – 00:00:40:15
Mike Walsh
What’s your favorite, by the way? And we’ll just preview yours chart really like.

00:00:41:08 – 00:00:44:23
Jeff Iannucci
Really like the different ways that you can use Adam Machanic’s who is active.

00:00:45:02 – 00:00:59:15
Mike Walsh
Cool. Yeah we use that a lot. So we’ll talk about that. And I believe in February, we’re talking about indexing. We’re just we have a bunch of topics. So just follow the blog and you’ll see what we’re talking about. But but today is all about HADR, and we’re not doing this with slideshows, we’re doing this with a panel discussion.

00:00:59:15 – 00:01:20:16
Mike Walsh
So I have a few a few panelists here. I guess I’m going to have to start with a person with a Christmas sweater on. So David, Seis is on our support team, he’s one of our DBAs is and he actually is really, really strong in info security and data security. So we lean on him a lot there. And he also has great fashion taste.

00:01:20:16 – 00:01:54:00
Mike Walsh
Obviously Sandra Delaney is here and she is I guess probably nobody in the call would dispute. She is our AG guru. So when anybody has a question with AG Windows clustering, failover clustering and SQL settings, timeouts, weirdness, we we always hope that Sandra’s around. We’re all really good at it. But Sandra is is Yeah Mike does not it think absolutely Mike Clark you can wave everybody they tell who you are anyway Mike is on our leadership team.

00:01:54:00 – 00:02:15:11
Mike Walsh
He’s one of our senior DBAs and Mike has been an enterprise DBA most of his career, so he’s really strong in HADR a lot of your clients that you’re a lead DBA for our and availability groups. Tara Tara Kaiser is here. I know why she had an HADR talk, isn’t she just you’re typecast as performance and parameter sniffing, but you’re actually really good with stuff.

00:02:16:06 – 00:02:17:18
Tara Kizer
Only typecast here though.

00:02:17:22 – 00:02:18:09
Mike Walsh
That’s right.

00:02:18:09 – 00:02:20:11
Tara Kizer
Lots and lots of background in HADR.

00:02:20:12 – 00:02:38:00
Mike Walsh
Yeah, tons of it. So we actually she does a lot with availability groups really go with log shipping. And if I have a client that calls up and says, Hey, can you support mirroring, I just I do not it. And so, hey, Tara, let’s talk to this client. So since she know mirroring is still is still there, it’s deprecated but it’s still there.

00:02:38:10 – 00:02:54:01
Mike Walsh
Jeff Iannucci is is one of the newest but you’re you’ve been here a while now but he is our newest senior DBA and he’s an all around great DBA. Actually, you know talk about your book because your book is getting some pretty cool traction, so this has nothing to do with HADR, but I’d love to love to let people know about your book.

00:02:54:16 – 00:03:10:02
Jeff Iannucci
Turning into a promo here. That’s right. So I’m in the middle of writing a book learning SQL in a month of launches. That’s the language SQL and SQL Server, which you can purchase an advance copy now and start reading the first six chapters in many publications.

00:03:10:21 – 00:03:31:01
Mike Walsh
So pretty, pretty cool. And Jeff knows a lot about HADR as well. So really we just want have a panel discussion and just sort of chat back and forth a little bit about HADR and then mostly answer questions as well. And the format here is pretty simple. This is being recorded, the HADR discussions recorded. There are blog posts on her blog this month.

00:03:31:01 – 00:03:50:24
Mike Walsh
If you go StraightPathSQL.com about each of these, I think David’s post is up and Jeff’s post is up might come up next week. Mike Lynn has a post as well about SQL Server Standard Insider Static Keeper, which is what we use a lot to help keep people on standard. Actually, Jeff had a great post last month about staying on SQL Server standard when it comes to upgrades and migration.

00:03:50:24 – 00:04:12:18
Mike Walsh
So those kind of tie in to each other. I think yeah, when we’re done with this part, we will stop the recording, do an ask us anything. If somebody has random SQL Server questions about HDR or really anything and we’ll do our best to answer and we won’t record that to protect the guilty if you’re blaming developers or SAN admins or problems.

00:04:13:14 – 00:04:33:00
Mike Walsh
So you know, basically I’ll just talk about a few things upfront. Actually I have, I have a let me share this link I have to find on YouTube again. Could I just close my browser? But I have a post pass presentation that a couple of years ago about an intro to HDR and that’s up on the web some place.

00:04:33:00 – 00:05:00:02
Mike Walsh
I’ll find it eventually. Here and share it when I’ve been talking, somebody else was talking, but it’s a great intro to what is high availability, what is disaster recovery and what things should you be thinking of? But I want to talk about a few of the things here and then kind of get into just stories because sometimes, you know, the gestalt method of talking through stories and experience, you actually learn a lot about what to do and we’ll teach some things what not to do as well.

00:05:00:02 – 00:05:19:14
Mike Walsh
I think so a few definitions that will help us. One is kind of HA and DR. Anybody want to give a real succinct answer on what the difference between the two are? I mean, obviously it’s high availability, disaster recovery, but anybody want to I have a crude way of talking about my presentations, but I don’t use the crude words.

00:05:19:14 – 00:05:25:20
Mike Walsh
But anybody have a good if not, I’ll just say it.

00:05:26:23 – 00:05:34:08
Tara Kizer
I don’t know about the crude way, but the I mean, the biggest thing is data loss. You know, HA is supposed to be no data loss of data is going to have some level of data loss.

00:05:34:13 – 00:05:59:13
Mike Walsh
Yeah, absolutely. Yeah. My my way is kind of I say is sort of for the oh no situations as well. So H.A. is really for a terrorist point. There’s no data loss. But typically when I’m developing an issue, the our solution is what we want to allow to have a sort of patching. You want to build a patch on a passive or fail over, you want to be able to support a local failure.

00:05:59:13 – 00:06:21:11
Mike Walsh
So something inside the data center goes bad for D.R. So, so sort of for the oh, no moments and DR for the, you know, the oh, bleep moments where you lose a site, you lose a SAN. You lose something core and you have to fail over to a different site. You know, some clients can have just DR, some can have just a some have both.

00:06:21:11 – 00:06:45:00
Mike Walsh
So that’s important. And there are some boring terms talked about too, like RPO and RTI, recovery point objective and recovery time objective. So Recovery point is really not just we’ll talk about low availability in a second, but RPO, RTO are not just about high availability. It’s really how much data can you lose that that’s recovery point objective. So to what point can you recover a recovery?

00:06:45:03 – 00:07:18:14
Mike Walsh
Recovery time objective is how long are you down for? And there are different many different ways to to meet HADR requirements. One thing that we have to do is DBAs is we cannot just sort of decide in a vacuum what HADR solution we’re doing. We have to talk to the business, so we have to be able to talk to the people and hey, hey Jeff, if we ask a customer, ask a customer sort of management team, the not that nontechnical management team, how much data can you lose and how long can you be down for?

00:07:18:14 – 00:07:23:13
Mike Walsh
What’s the answer? We’re going to get?

00:07:23:13 – 00:07:26:09
Jeff Iannucci
What do you think? I mean, that’s literally the answer. What do you.

00:07:26:09 – 00:07:28:07
Mike Walsh
Think? Yeah. Or none, right?

00:07:28:17 – 00:07:30:09
Jeff Iannucci
Yeah. Or none. Right.

00:07:30:09 – 00:07:35:09
Mike Walsh
And is that possible? Tara, you’re drinking?

00:07:35:09 – 00:07:40:10
Tara Kizer
Sorry. There’s a production issue at one of our clients that I was dealing with.

00:07:40:15 – 00:07:45:24
Mike Walsh
Oh, nice. Oh, you’re really drinking, then? That’s okay.

00:07:47:01 – 00:07:47:21
Tara Kizer
What’s the question?

00:07:48:00 – 00:07:56:22
Mike Walsh
The good question is, if we ask a client or a client sort of business leadership team, you know what? What’s your RPO and RTO, the answer is obviously zero and zero. Yeah. Is that doable?

00:07:57:18 – 00:08:26:02
Tara Kizer
It’s doable with enough money. Lots. Lots of zeros at the end there. Yeah. You know, clients think that they that their systems are highly available or, you know, that they think that they’re probably hitting near 100%. But when you look at their how they have job setup, how their hardware setup, how their HDR technologies are set up, in reality, they probably have more like 4 hours or more a RPO or DR you know, because of things like index maintenance, index rebuild.

00:08:26:02 – 00:08:33:21
Tara Kizer
So your log shipping or async AGs will get very, very latent when you hit a large index rebuild and the business just isn’t aware of that.

00:08:34:08 – 00:09:00:01
Mike Walsh
Absolutely. You know that that’s that’s perfect and it’s it’s very true so so you can do any RPO and RTO you want it just we you know it’s not only the more cost but it’s just harder to manage. You know, it’s harder set up, harder to manage that There is it’s it’s more tenuous. So we have to be you know the DBAs are we’re the geeks at the company who are unafraid of talking to people.

00:09:00:11 – 00:09:25:19
Mike Walsh
We have to be able to kind of communicate and sort of soothe-say what what the client’s asking for or what your leadership’s asking for, and then talk about the pricing and how you can do it. So more more nines mean more zeros is a great way of thinking about SLA, that that’s the service level agreement. That’s just the sort of the contract, for lack of a better word, that that we use to to sort of enforce the RPO.

00:09:25:19 – 00:09:59:10
Mike Walsh
and RTO. and some SLAs have teeth some SLAs don’t have teeth. It depends on who you are. I want to talk a little bit about what you might call low availability first, because high availability is really important, but if we can’t do the basics first, we’re kind of in trouble. So. So David, I want to talk to you about kind of backup recovery because if you’re here and you’re not doing what we might call a proper backup, I wouldn’t worry about HADR yet.

00:09:59:10 – 00:10:21:06
Mike Walsh
I would just worry about your generic availability. You know, you do a lot of health checks for clients. David, You’re in our support organization, so you see a lot of backup failures and you’re also care about security, and ransomware is a real threat. So so talk talk us through kind of backing up and recovery when when is a backup good and kind of how does that work?

00:10:21:06 – 00:10:23:06
David Seis
Sure. So we are.

00:10:24:09 – 00:10:53:12
David Seis
We use an OLA and an adapted OLA framework here for all of our clients, which ensures up to, you know, 15 minutes point in time recovery for, you know, day to day, you know, and even, you know, having high availability if something happens, somebody accidentally truncates a table because they thought they’re in their DEV environment, they may have up to 15 minutes of data loss maximum using that.

00:10:53:16 – 00:11:25:15
David Seis
And we do that through, you know, full backups, weekly diff backups daily in log backups every 15 minutes for the most critical databases. The way we verify that there’s there’s built in verification to make sure the backup isn’t corrupt through OLA procedures. And then we also recommend a few other options. A lot of clients have their backups sent to a network share, which simplifies management of the space and the backups.

00:11:25:15 – 00:11:50:20
David Seis
In that sense, we also recommend an offsite backup, you know, maybe copying the backups from that share to an Azure or S3 bucket so that whenever ransomware hits, hopefully it doesn’t. But if it does, the first thing they look for are those backup files and then they’ll encrypt them or delete them. And if you’re lucky, they’ll be able to encrypt them if you did pay.

00:11:51:01 – 00:12:29:05
David Seis
So the the first step is to get those backups so that in a disaster you can get your your data back and then we’ll just check my note real quick. Yeah. So the offline for offsite backups. And then the second part of the process, we recommend is always doing restore testing. Yeah. So how long does it take should a disaster occur to get those backups from our cloud buckets wherever they are and get production back up and running so that we can say this process is good enough for what we have.

00:12:29:05 – 00:12:55:07
David Seis
Like 5 hours is enough time and we’re okay with that or it’s not enough. And how do we fix it? What what other stopgaps do we need? Do we need, you know, different jump boxes? Do we need, you know, static storage somewhere else? So those are the questions you can answer by doing backups regularly, restore testing, having ransomware protected backups by having them offsite and not accessible on your network, etc..

00:12:56:00 – 00:13:27:15
Mike Walsh
I love that. And I mean, I think everybody in this panel has unfortunately, there have been a part of or been called into a situation where a client wasn’t pining for ADR. They were just pining for good backups. I mean, we’ve all done it right. We’ve all been there. And that’s not a fun conversation. And with ransomware spreading, we’re seeing it more and more so there are people who will watch this video or hear that that may not have good backups and that that’s it’s such an easy thing to do.

00:13:27:15 – 00:13:45:06
Mike Walsh
And in the framework Dave was talking about was the OLA maintenance solution, which is what we prefer, and I’m assuming we will talk about that in next month’s webinar and, and blog post about our our favorite community tools. Maybe somebody can just post a link to to all us site there just for folks who are following along at home.

00:13:45:21 – 00:14:07:22
Mike Walsh
But yeah that that’s that’s great. Thanks David And the really important thing there is when it comes to backups, the only reason we do backups is for restore. So I don’t like to think of backups. I like to think restores and that changes your your kind of your outlook on how you how you approach it. You know, there are many, many, many ways to accomplish or achieve disaster recovery.

00:14:09:01 – 00:14:51:11
Mike Walsh
There’s log shipping, which has been around for a long time, and that’s really good for D.R. Not really HA, obviously, because it doesn’t automatically fail over. It doesn’t sort of detect a problem. It’s not a quick fail over you don’t change your just not seems to users. But it’s great for DR. It’s great for migrations, too. As we talked about last month, failover cluster instances, that’s that is great for HA And with sales data keeper like Mike Lynn blogged about last week, you can do DR with that because you can you can achieve shared storage across the country with storage replication availability groups that that is sort of what Microsoft morphed mirroring into, if

00:14:51:11 – 00:15:16:09
Mike Walsh
you will, availability groups get the ability to have HA and DR and you can bring multiple pairs and you can read off your secondaries. SQL Server 2022 gave us a lot of new features there. That’s that’s an important technology. And DR But a lot of people think all AGs are there. I’ve actually heard we had a conversation with a customer maybe a month ago and they said, I thought failover clusters were deprecated.

00:15:16:09 – 00:15:43:00
Mike Walsh
No, they’re not deprecated. AGs are not the end all. Be all there. Great when you need them and we’ll talk about the difference in a little bit here. Then you can do it through VMware or Hyper-V. There’s different technology behind the scenes. Problem of those is they don’t protect your C drive. So if you have a hardware failure with VMware, that’s this is, this is the extent of types of VM reach unless you have a question in the chat.

00:15:43:08 – 00:16:07:20
Mike Walsh
But if you have a hardware failure with VM so a host dies. Congratulations, you’re good because your VM is going to fail over if your operating system goes corrupt. Because a bad Microsoft patch came out, which, you know, knock on bald guy’s head, it does happen. And we’ve seen it more and more lately as as the technology has increased and the uses and the use cases for testing increased, we’re seeing maybe what feels like more oopses and patches.

00:16:08:11 – 00:16:43:01
Mike Walsh
And if you corrupt your operating system, VMware is just going to take your corrupt operating system and move it over to it. And now there are ways like virtual tools to do snapshots and potentially lose 15 minutes of data. But VM or HA will not protect you from what you might call c drive protection failure or operating system failure. And then if you’re in platform as a service offerings up at up in Azure or a AWS or Google Cloud, there are myriad technologies which are based on the ones we’re talking about here that do some of that for you.

00:16:43:01 – 00:17:02:19
Mike Walsh
You know, I’ll just kind of go down the list of some of the team members and talk about certain certain technologies. So so Sandra, when it comes to availability groups, just give us kind of a brief overview of them and pros and cons and maybe your favorite thing about them and your least favorite thing about them. If you have a least favorite, well.

00:17:04:01 – 00:17:27:10
Sandra Delaney
They are good for HDR, but if you have if you have clients that insist on having two nodes where one is on one side and one is in a completely different site, the bad part about that is if you put it to synchronous, you’ll have latency. If you put it to async, then you have issues with, you know, failover.

00:17:27:10 – 00:17:55:17
Sandra Delaney
You can’t fail over. If you have a node that’s an asynchronous, you can’t you have to force it because you have to force with data loss. If they are in sync in their two separate locations, you can have something called where within the cluster it sees it as What does that word I’m thinking of? Oh, where they were, they both think that they’re both primary.

00:17:56:10 – 00:17:57:00
Mike Walsh
Split, brain.

00:17:57:09 – 00:17:58:07
Sandra Delaney
Split brain. You have a.

00:17:58:07 – 00:17:59:00
Mike Walsh
Small brain right now.

00:17:59:03 – 00:18:22:18
Sandra Delaney
Yeah, yeah, yeah. Absolutely. So and that’s bad because then you’re going to have to force one to be primary and hoping that it it can cause problems. So that can be an issue. Ideally, you like to have two nodes that are in the same location that are synchronous. And then if you’re going to have DR move that somewhere else as async.

00:18:22:18 – 00:18:53:10
Sandra Delaney
So and then when you do have them synchronous, set them to automatic so that they can fail over like you want them to. However, your application has to be AG aware and they also, the client needs to understand that Always-On is not always on. When you fail over, you get disconnects and you reconnects. And if your application is AG aware it’s automatic, I have a client where it’s not AG aware and they have to recycle the application for their the connections to connect.

00:18:53:10 – 00:19:13:12
Sandra Delaney
And so you have to kind of talk to your clients about all of those situations and that they need to be aware. And then I also have had clients where it has to be on a specific node. So when you’re doing patching or whatever that they do, you have to make sure that you’re failing over to the node that they want it.

00:19:13:12 – 00:19:15:21
Sandra Delaney
So there’s pros and cons to it.

00:19:17:10 – 00:19:41:19
Mike Walsh
Absolutely. And you know, it is a great technology where it makes sense. I like it’s creation migrations. It’s great to add a node in quickly and easily depending on your environment. Now, Tara has a few clients that are that are that are in her sort of list of clients that she is the web for that are you know we had one yesterday with the 27 terabyte database that those don’t come in and out of a third node in AG pretty easy Jeff you want to talk about failoverclusters a little bit?

00:19:43:00 – 00:20:05:17
Jeff Iannucci
Sure. They’re not as glamorous as availability groups. Everybody. Everybody loves availability groups because you get the readable secondary ubiquity of the synchronous and the asynchronous. You get a lot more options than you do with a failover cluster out of the box. Your failover cluster instance has that shared storage, which can become a single point of failure if you if you don’t have it configured well.

00:20:05:19 – 00:20:31:17
Jeff Iannucci
I mean, just because it’s shared storage. But one of the big advantages of failover cluster instances is with standard edition. And I think as you alluded to earlier, our colleague Mike Lynn wrote a pretty good blog post that’s out there right now about using silos, which instead of sharing the storage, replicates the storage across to two different servers, which don’t necessarily have to be in the same data center.

00:20:32:07 – 00:20:56:04
Jeff Iannucci
And you get that that high availability aspect. Well, while still being able to use a failover cluster instance, the benefits that he alluded to in that blog post specifically for standard edition, you can use it standard edition in availability groups as well. It’s basic availability groups, right? You get one database per availability group. You don’t get readable secondaries unless you’re doing snapshots.

00:20:57:03 – 00:21:07:18
Jeff Iannucci
So there’s there are limitations for standard edition. But with the failover cluster instance, really the limit, the main limitation is if you get two nodes, which most people use for most failover cluster instances anyway.

00:21:08:01 – 00:21:24:18
Mike Walsh
Yeah, absolutely. And again with the data keeper, know you can, you can do DR. So if you want to have DR only you can do that because we’ve had some clients who don’t really want HA. they just don’t want to lose data. They want to say, Hey, listen, if something bad happens, we want to know that we have data someplace else and we can get it online quickly.

00:21:24:18 – 00:21:42:13
Mike Walsh
We don’t care if it’s not as performant. We don’t care if it’s far away. So that’s a great candidate for DR. We can be down for an hour or two. We don’t want to lose data and we want to be able to survive a site failure that’s there. So yeah, and we really enjoy putting people into SQL Server standard when they’re coming from Enterprise.

00:21:43:00 – 00:21:57:13
Mike Walsh
I tell my clients all the time, I would much rather you spend money on consulting than on SQL Server licensing. If we can help you with that. And since we’re going like in order of glamor here, I didn’t mean to, but available the groups of you all call the most glamorous available cluster is a little bit less glamorous.

00:21:57:13 – 00:22:05:15
Mike Walsh
We’ll we’ll talk to Tara about log shipping which which is kind of the the least glamorous DR and HA technology out there. But I actually like it a lot.

00:22:06:14 – 00:22:10:03
Tara Kizer
And I’m probably the worst one to talk about it because I only have bad things to say about log shipping.

00:22:10:03 – 00:22:18:02
Mike Walsh
So. Well, it’s good. It’s good to talk about the cons. So So yeah, well I’m going to forces. Is something good about like in debate school or you have you have to take the other side.

00:22:18:02 – 00:22:28:04
Tara Kizer
So yeah I mean certainly it’s a rock solid technology. It’s been around since SQL 2000 days. You know, back before 2000 we had to roll our own logic memory. Pretty sure 2000.

00:22:28:04 – 00:22:28:15
Tara Kizer
Was the first.

00:22:28:15 – 00:22:57:17
Tara Kizer
Log ship conversion. I’ve been using it since SQL 2000. One of the companies I worked for in the past, we did Disaster Recovery failovers on purpose to our disaster recovery site twice a year, and we ran production out of there for two or three weeks at a time to prove that the hardware in a whole setup was good. And back then we just had log shipping and the amount of work that you have to do to get log shipping set up, you know, it’s the same for AGs and database mirroring because you have to do a, you know, restores and all that.

00:22:58:01 – 00:23:24:15
Tara Kizer
But the, you know, the steps required to do that failover. You know, it’s not as glamorous as, you know, FCI failover and AG fell over database mirroring failover. You know, there’s additional work there could be scripted and, you know, it could be fast. And, you know, one of the issues that we had is that some of these systems that we were failing over had a lot of transaction and a lot of transactions mean bigger transaction log backups, which means you have to restore those, copy those over and restore them.

00:23:25:05 – 00:23:49:18
Tara Kizer
Same thing can happen with AGS and database mirroring but at least those are constantly flowing. Whereas with log shipping it’s a backup and a restore event that has to occur. So there’s more latency there. So we had scripts that we would run prior to the actual failover to get those caught up as much as possible and then do those failovers and, you know, testing would occur and bring the sites online and then you got to set it up in the other direction and that is exhausting.

00:23:49:18 – 00:24:16:23
Tara Kizer
Exhausting for the amount of servers we had to do. Yeah, it is a great technology, especially for migrations to great tech technology for disaster recovery. Some people like to use it for reporting. I don’t like it for reporting because of the latency involved. But if it you know, if you’re okay without latency then then that that’s great. It is a good technology to do reporting on If you’re okay with like 8 hours of latency where you skip those restores four or your business hours.

00:24:16:23 – 00:24:18:08
Mike Walsh
Load and restored every day.

00:24:18:10 – 00:24:34:23
Tara Kizer
You can’t do that to reports. I mean, the users don’t like that. So you have to build in a you know, into the job schedule that you’re not going to do restores during the day. No, you’ll catch up at night. But that just increases your RTO. Doesn’t really increase your RPO because the files are still being copied, but the restores aren’t happening.

00:24:34:23 – 00:24:47:07
Tara Kizer
So there are RTO increases. So I like log shipping, but I don’t like it as far as having to use it as compared to AG’s database mirroring or FCIs don’t apply to disaster recovery.

00:24:47:07 – 00:24:47:13
Mike Walsh
Yeah.

00:24:47:16 – 00:24:48:04
Tara Kizer
Not usually at.

00:24:48:04 – 00:25:06:08
Mike Walsh
Least. Yeah. Yeah. And a great SQL Server standard thing to do if you need HA and DR is you do two nodes in a failover cluster instance, SQL Server Standard and then you do log shipping someplace else for the R and you know that that’s it’s really good for that. When you start talking, we’re reporting I’m going to tell you I don’t like that.

00:25:06:08 – 00:25:11:10
Mike Walsh
I start cringing like it’s just just back up your darn database once and I don’t use logs.

00:25:11:10 – 00:25:14:00
Tara Kizer
Yeah, exactly. Yes. There’s a different solution that could be easier.

00:25:14:00 – 00:25:39:06
Mike Walsh
Yeah. And don’t or don’t do it. There’s the you know. Yeah. So that’s those are the main technologies that are not missing. Any replication is kind of one, you know I, I talk about my presentation sometimes it’s Microsoft still lists replication as a solution to our solution. I don’t know I, I would much rather use log shipping than that replication.

00:25:40:03 – 00:25:46:11
Tara Kizer
If you attended any office hours from Brent Ozar Unlimited. And that topic comes up, you always hear me saying, Wow, that’s a bad idea.

00:25:46:11 – 00:25:47:19
Mike Walsh
That’s a terrible idea. Yeah.

00:25:48:12 – 00:25:51:00
Tara Kizer
Replication is not a HADR solution.

00:25:51:03 – 00:26:19:05
Mike Walsh
To replication is really good for reporting. You know, transactional replication is good for reporting. And if your application really needs merge replication, God bless you and good luck. We’ll help you. We have the replication people here. But you know, replication is wonderful for for, for reporting I think, and for getting some data out to other customers, etc.. You know, so let’s talk about I like horror stories.

00:26:20:16 – 00:26:50:11
Mike Walsh
So I’m curious if anybody here has made a mistake relating to HADR that costs downtime and I’ll go first. I’ve done this and some of our clients have done this and have a blog post about it and, and I feel really bad when our clients do this because I’ve been there, done that one day, many, many moons ago I was working at a Windows 2003 SQL 2000 2005 failover Cluster instance and I was a new DB newish DBA.

00:26:50:12 – 00:27:07:17
Mike Walsh
I had just learned about clustering. We had a lot of storage that was being held up in the in the the Dell compellant SAN the storage admin because back, back then we had storage admins used to give me a hard time and call me a disk sponge because I was a DBA and we made them give us all the spindles and our own dedicated raid groups and metal ones.

00:27:07:17 – 00:27:24:11
Mike Walsh
And he hated us for that. But I gave him a terabyte of storage back, which back then was a big deal and I was all proud and want to make Mike the storage admin happy. So I made sure it was done. I looked, looked away, looked looked away, said, Yep, this desk is completely empty. Not a single database file on it.

00:27:24:18 – 00:27:37:24
Mike Walsh
I’m done with it. So I went in and I offline the disk. And anybody in this panel know what happened when I did that? Sandra, What did I do?

00:27:37:24 – 00:27:39:04
Sandra Delaney
Sorry, I was working on a client.

00:27:39:12 – 00:28:02:03
Mike Walsh
Oh, that’s right. See, we’re so busy. Always be billing. Yeah, So I remove a disk and I took the data off at first, but something bad happened in the cluster. Anybody know what happened? Nobody. It failed over. Then it fell back, then it failed over. Then it stopped because it hit the magic three times. What did I forget to do?

00:28:02:03 – 00:28:04:02
Sandra Delaney
And you’re in a in a G or FCI.

00:28:04:07 – 00:28:05:07
Mike Walsh
Failover cluster since.

00:28:06:09 – 00:28:09:00
Sandra Delaney
Oh, you didn’t move it out of the cluster.

00:28:09:05 – 00:28:25:03
Mike Walsh
As a dependency. Yeah, right. I did not go into the dependencies and tell them tell SQL Server not to use this drive. So I went from being a hero to the storage admin to being as zero with the CIO. And that that’s a you know, if you’re if you’re dealing with office politics, that’s the wrong way to go.

00:28:26:03 – 00:28:42:05
Mike Walsh
She came running down there because it was an online nursing app that that 24 seven an insurance company’s nurses were doing health care coaching for clients. So it was not it was not a good day. It was not a fine moment. Anybody else make any big mistakes or of change? That was a really proud hand raise.

00:28:42:20 – 00:29:04:23
Sandra Delaney
Well, it wasn’t really my mistake. So we had an environment where they were. VM motioning because they were moving the, the, the VMs from one node to another where it’s faster disks because it couldn’t keep up. So they turned off one node and was copying it over which it could talk to the witness and it could talk to one of the nodes.

00:29:04:23 – 00:29:32:23
Sandra Delaney
So it was happy. Then they decided to ve motion the second one, the Witness, which then brought it down to one node. So they brought the whole sucker down and they couldn’t turn on the other node because it was still V motioning. So I had we actually had a R server that I was able to throw back in to get the nodes up while while it was v motion because they were, they were huge VMs, huge VMs.

00:29:32:23 – 00:29:53:01
Sandra Delaney
So it wasn’t really technically my fault, but I had to fix it. So I learned that time to make sure you always have two, that they can talk because once it brings it back down to one, unless you’re going to force quorum, you’re in a bad spot. And then I heard from that was my one of my old jobs.

00:29:53:10 – 00:30:14:16
Sandra Delaney
But it was again, it wasn’t my issue. I just heard about it later where the SAN guys, they had two nodes or they had yeah, they had two nodes and they were sharing a LUN on the SAN. This that LUN went down, which brought both of them down. So any time I talk to people about AGs, I’m like, do not share LUNs.

00:30:14:16 – 00:30:26:12
Sandra Delaney
If you have two nodes, they have to be separate because if one goes down and you’re sharing it, you’re done, you are done. So I make sure I make a point to say that when I talk to people about that, because.

00:30:26:15 – 00:30:31:22
Mike Walsh
That’s one of the big benefits of an AG, right, is you can have everything separate. But if you’re sharing.

00:30:32:04 – 00:30:51:21
Sandra Delaney
Yeah, you’re sharing a LUN. Yeah, it is separate, but you’re not separate and a lot of times we don’t know what they do on the back end. So I always bring that up and say, You need to figure this out because I don’t, I don’t I can’t see what you have on the back end. But I’m telling you, if you have that and that goes down you’re down you’re.

00:30:51:21 – 00:31:18:00
Mike Walsh
done. Yeah, that’s absolutely and that’s ah yeah. Because failover clusters require shared storage and we’re so as data keeper you can replicate share storage. So that’s good but if you’re on a SAN anybody know what like a famous refrain SAN say SAN salespeople say but also said and sans can never anybody starts with an F fail Have you guys heard that before.

00:31:18:00 – 00:31:40:11
Mike Walsh
Oh SANs can’t fail. There’s all these redundancies built in and but but yeah they do fail so so AGs are great for that and we all share any horror stories that that things you’ve done or we can expand that we can talk about things that we’ve we’ve seen our customers do or past coworkers. If you want we won’t name our customers who do this.

00:31:40:24 – 00:32:02:09
Jeff Iannucci
But I’ll talk about one quick thing, and this isn’t really nearly as disastrous as what you’ve talked about, but when setting up an AG, if you choose auto seeding, that’s great with with smaller databases. But at one point I remember adding a five terabyte database to an availability group, and then the estimated auto seeding completion date was somewhere like months in the future.

00:32:03:04 – 00:32:22:23
Jeff Iannucci
So that was going to be a while. There’s a if you’re going to go that route, there is a trace flag. I think it’s nine, five, six, seven. That’s really handy to to enable compression for that which can auto seat a heck of a lot faster. Yeah. For your data basis and and availability group just a just a tip if you ever add one of those large terabyte or more databases.

00:32:23:03 – 00:32:50:07
Mike Walsh
I love that a hippie tip I just today I learned that phrase a hep tip a good thing there too. Like we’re big fans of patching and you know, being up to date. Over the years, auto seeding has has gotten better and in various SKUs. So you know, if you’re still running, say, Super server 2017, RTM, release to manufacture, you are not going to have the same auto seeding experience as someone on the latest CU as 2017 or 2019 or 2022.

00:32:50:14 – 00:33:15:01
Mike Walsh
So AGs especially are good reason to sort of stay ahead of the curve with upgrades. Don’t you know, go back and watch last month’s webinar and videos and don’t don’t get behind the curve and fall back, you know, 12 versions behind because it’s comfortable, because you will have a different experience with AGs. And when you call Microsoft, if you’re a couple of versions back, their answer is, Well, I’m sorry about that.

00:33:15:09 – 00:33:20:23
Mike Walsh
I hope I hope things work out for you more or less, even if it’s under support. Technically speaking.

00:33:21:09 – 00:33:42:24
Tara Kizer
On the auto seeding topic, one of our clients recently had an issue with it. This cluster was set up at least a year ago. auto seeding was was used for, you know, to set up the the group and all the smaller databases got sync and then we manually did the the bigger databases. But the auto seeding config was still in place up until about two or three weeks ago.

00:33:43:20 – 00:34:06:21
Tara Kizer
auto seeding was now clogging up the error log with repeated messages that basically that there was nothing to auto seed. And when I say clogging it up, I mean many gigabytes per day of error log junk from auto seeding and just had to to turn the auto seeding config change off, you know, and that prevents you from auto seeding new databases.

00:34:07:17 – 00:34:25:11
Tara Kizer
But I’m not sure why it was so verbose in 2019 and it wasn’t a problem until, you know, this got in in November and have been like this for a while. But, you know, it’s it’s a you know, you want your air logs be clean and 2019 can be very verbose about a lot of things. And one of them is all the seating apparently.

00:34:25:20 – 00:34:37:08
Mike Walsh
Yeah absolutely. Yeah I’ve noticed that too. You can tell when a client just recently upgraded because it’s like what is going on the air. Like it was like a blip. It was a 2019 and I think they even got rid of it. Maybe some cuz use of it, but.

00:34:38:00 – 00:34:53:22
Tara Kizer
This is on the latest to you. Okay. This client has a lot of databases. So I checked the DMV where the auto seeding I think it’s logging is and you could see it just going through 70 databases trying to see if anyone needs to be out of seat. It’s like no this those have been sync for a year.

00:34:54:05 – 00:34:55:08
Tara Kizer
I don’t need it here.

00:34:57:00 – 00:35:07:14
Mike Walsh
As happens with consulting, Sandra had a jump off to help with a client emergency so that that’s that happens you might see some aliases sometimes in future webinars here you know it is what it is.

00:35:08:10 – 00:35:10:04
Tara Kizer
And we talked about yesterday’s client issue.

00:35:10:05 – 00:35:16:15
Mike Walsh
Yeah, let’s talk about that. You want to talk about that. So I’ll just say let’s talk about kind of issues we’ve seen and help troubleshoot that you should be aware of.

00:35:16:15 – 00:35:19:08
Tara Kizer
I just wasn’t sure if we talked about it since I also was working on projects.

00:35:19:08 – 00:35:24:17
Mike Walsh
Yeah, yeah, I get it all. The one doesn’t really work a straight path. So that’s the job of a CEO.

00:35:25:14 – 00:35:43:19
Tara Kizer
Nothing. Yes. So I’m the I’m the on call DBA and got to got a call from our answering service that we had a client that had a hardware outage. And so Mike and I joined the call and you know, got logged in and pulled up their AG config as they were. They were asking about failing over to the other server.

00:35:44:01 – 00:36:14:09
Tara Kizer
And in a proper HA environment, either FCI, FCI, AG’s database, mirroring, whatever you’re technology is, that failover would have already occurred so you know automatically and so I knew to check the ag config this is an AG and they’re asking for a manual failover to occur. And notice that the not only was it set to manual failover for the other two secondaries in this in this group, but they were both set to asynchronous so now they have a hardware failure.

00:36:14:09 – 00:36:35:24
Tara Kizer
That server is down hard in some kind of hardware loop. Later learned that they need to replace the RAID controller and they’ve been they’d already been down for an hour when they reached out to us for help. And so they were in a bit of a panic. And, you know, you can certainly do a failover to it, but the command to do that, you literally says in the command force failover data loss.

00:36:36:07 – 00:36:53:06
Tara Kizer
So I copy that command in sharing my screen. I showed them, you know, this is the risk here. And we don’t know how much data loss there is going to be that the one secondary that we’re going fail over to is in the same data center. I think they said it was in the same. I don’t don’t even know if it’s the same rack but it was nearby the server that was down.

00:36:53:13 – 00:37:15:23
Tara Kizer
So data loss should be minimal. But we can’t quantify that. We we can assume that it would be less and 5 minutes because we have modeling tools that’ll tell us if it’s more than 5 minutes less. We’ve changed that that default threshold. In likelihood it’d probably be a couple of seconds. You know, that’s generally what you’ll see if you look at the ag latency DMVs for that.

00:37:16:06 – 00:37:33:08
Tara Kizer
But you know, it could be small, it could be larger. You know, we can’t quantify that. That’s something that you have to take a look at, hopefully have some kind of table that if your system’s busy enough, you’d be able to see the gap from the the last, say, insert of some busy table two after the failover occurred and your system’s back up and running.

00:37:33:08 – 00:37:49:13
Tara Kizer
So data loss occurs there. You know, maybe not if your system’s not busy. Of course, they lost is not going occur. But this is a sound like this is a busy system and they need to get their system up and running. So eventually we did do that forced failover data loss and that that their hardware is still not online.

00:37:49:13 – 00:38:09:24
Tara Kizer
So waiting for the client to tell us about the hardware issue been resolved and then we can do the get it back in sync, get that node back in sync. So, you know, I think that the client thought that they had high availability in place and they had a past employee who set this up and told us that this is how the config should be.

00:38:09:24 – 00:38:25:00
Tara Kizer
That employee no longer exists. And I don’t think that the people on the call were aware that they really did not have HA in place for this. And so, you know, it’s not a good situation to think that you have HA and they don’t really because of a config a config issue.

00:38:25:07 – 00:38:25:22
Mike Walsh
Absolutely.

00:38:26:13 – 00:38:28:17
Tara Kizer
We were told at the time not to touch it, you know. Yeah. They said.

00:38:28:17 – 00:38:29:04
Mike Walsh
It read a lot.

00:38:29:05 – 00:38:29:17
Tara Kizer
On us.

00:38:29:22 – 00:38:30:04
Jeff Iannucci
You know.

00:38:30:15 – 00:38:48:20
Mike Walsh
They, they had some you know and what it was was basically just you know somebody, somebody said that said that this probably doesn’t work in the application and nobody you know and the client in check it so that basically hey this doesn’t work and failover they had some bug so they said let’s neverfailover automatically because something doesn’t work.

00:38:48:20 – 00:39:04:17
Mike Walsh
And turns out it was actually I’ll deal with the more thing. So what it would have been a difference a lot of things in it I’m not going to tell the story about the the roast pan but I’ll blog about some time. But a lot of things happened because it’s always been that way and it’s not actually what happens.

00:39:06:05 – 00:39:32:00
Mike Walsh
So the another take on what happened with that client yesterday, Tara, was we had a client, I don’t know, a couple of months ago or a month ago, and they basically had a situation where where they had three node AG, two nodes in the same data center, one node across the country. That was supposed to be a DR node and the HA nodes stay in sync synchronous mode, which is good.

00:39:32:00 – 00:39:59:11
Mike Walsh
So they had a fail over, no problem. But they had a new DBA that was helping and the DBAs are that nodes not asynchronous. We should make it say synchronous because in a in an availability group, I mean this is fine. This this is just this is why this is why you partner with people who do this. When you look at an AG on the database list, if a database is in a synchronous commit mode and everything’s working fine, the database says synchronized.

00:39:59:13 – 00:40:26:22
Mike Walsh
Synchronized, you know, as in ED in an asynchronous AG or an AG that’s unhealthy. Either way, it always says synchronizing. And if you’re a person who likes everything looking the same, synchronized, synchronized, synchronized, synchronized, synchronizing, Oh, no, there’s a problem. Well, one sure way to make all the databases synchronized is to take that third node across the country with a low latency connection and put it into synchronous mode.

00:40:27:07 – 00:40:45:21
Mike Walsh
I guarantee you that your AG will become synchronized in that node. But. But to do that, you’re going to introduce a lot of waits called HADR sync/commit waits in everything you do now on the primary will be slow waiting for it to go all the way across a country, commit there and then have a notification, come all the way back and say, Hey, you’re committed.

00:40:46:23 – 00:41:07:02
Mike Walsh
So they had a bunch of performance problems and we we just said, Hey, let’s go back to asynchronous SDR and you already have and poof, all the performance problems went away. So, you know, it’s the same it’s the same problem on two different sides of the spectrum. And really, you know, I’m we’re a consulting company, so so I am fully biased in saying this.

00:41:07:02 – 00:41:31:15
Mike Walsh
And I fully understand that if you’re doing a new HDR implementation, you’ve never had HDR or you’re doing a migration or an upgrade and you might get to do one upgrade every ten years at your company, every five years hopefully working with a consulting company, whether it’s ours or any of the other great ones out there in the SQL family working with a consulting company who’s done this and do this all the time.

00:41:31:20 – 00:42:01:09
Mike Walsh
We, you know, I jokingly mocked farmers insurance model here in America. It’s we know a thing or two because we’ve seen a thing or two. So working with a consultant partner for just 20 or 40 hours, you’re going to get all of that knowledge and not make that kind of mistake in your setup. And we get clients all the time that come in with health checks saying we have an AG and everything’s slow and that that HA and DR description I just gave you, that’s their current status or the one Tara gave you, that’s their current status and they’re not able to have a fail over.

00:42:02:10 – 00:42:26:01
Tara Kizer
So I’ll admit it to, to a production issue that I caused, which is related to what you’re discussing, that that company that I worked for for a lot of years and did those fail overs to DR Twice a year, one of the systems that was fully scripted, I mean I don’t remember the details of it, but after the failover a few days later, the customer support people were saying you’re getting a lot of calls that performance is bad.

00:42:26:20 – 00:42:43:01
Tara Kizer
And you know, I didn’t find any issues at the time. This was probably ten, 15 years ago. And you know, wait stats analysis is what I do these days. And I have been doing it for a lot of years. But back then I did not know to look at the wait stats and had I look, I would have seen HADR sync commit waits.

00:42:43:01 – 00:43:07:20
Tara Kizer
One of the this wouldn’t have been AG, this had been database mirroring one of the the secondary partner in the database, mirroring pair was set to synchronous commit and I left it that way. It should have been async after the testing was done you know synchronous to allow the failover to occur and then switch it to async. And I missed that step to switch it to async mode that we were going stay stay in.

00:43:08:02 – 00:43:27:22
Tara Kizer
This is from San Diego. Las Vegas stay in Las Vegas as a disaster recovery server and think the issue got reported to me on a monday. And it wasn’t until Wednesday that I finally realized what the issue was. Again, this was many years ago. This is not a mistake that would be made today, but it is a difference of a you know, at the time I probably would have been an intermediate DBA, and I’m well past that at this point.

00:43:27:22 – 00:43:48:21
Tara Kizer
But, you know, that is what happens. You know, you learn from this mistakes that you made. And, you know, it was just performance, you know, and performance is important. But this was kind of an outage because there weren’t queries timing out. It was it was a big deal. So you just got to make sure that, you know, when you’re done with the task that you’re doing, that everything has been completed.

00:43:49:05 – 00:44:05:15
Tara Kizer
You know, this is one system, this one database got got missed switching it to async because the FCI is for FCI, it’s for HA And then it was asynchronous database mirroring for DR, But in order to do that failover, you have to switch it over to synchronous commits that you don’t have that any data.

00:44:05:15 – 00:44:28:04
Mike Walsh
Loss Yup. Been there done that I’m notorious even even now even after owning consulting company being a consultant for people. I’ll admit it, I’ve done this a couple of times, actually. One of your clients, Mike, before you were there, their lead DBA doing some work, like maybe reseeding replication or reseeding an availability group. And I don’t want the log backups running.

00:44:28:04 – 00:44:40:23
Mike Walsh
All I’m doing. It’s trying to set up a log backup job and fix my work and then I go to bed satisfied. They do the work and then they wake up saying, Why is a log so big of that database? Oh, because there’s no log backups. Thankfully, we alert for that, whether they’re monitoring tool.

00:44:42:18 – 00:45:01:22
Tara Kizer
So that’s a mistake that we’ve all made in the past. It’s not a mistake I make anymore because I just don’t disable. Yeah. Your approach over there is a stored procedure you can run that allow you to just get all those log backups restored that have occurred. The backups have ran, get them restored. SB Database restore or restore?

00:45:01:22 – 00:45:15:16
Tara Kizer
Jane, there’s there’s plenty of other stored procs out there that do this work. There’s there’s no reason in this day and age that you need to disable that log backup job. You may need to run it a couple of times the stored procs to get it in sync, because while it’s getting caught up, another log backup may have been completed.

00:45:16:00 – 00:45:32:22
Tara Kizer
But because I made that mistake so many times, I’m just happy that there’s a stored proc that I can run that allowed me not to have to disable that job anymore. And if I do disable it, let’s say if I did, let’s say I’m going to go ahead and do that. I’m going to add it to my calendar for, you know, 2 hours from now to disable it or enable it.

00:45:33:07 – 00:45:40:18
Mike Walsh
That’s why the best part of being a founder and CEO is hiring people who are better at these things than you.

00:45:41:07 – 00:45:55:08
Tara Kizer
I think it’s more I think it’s really important as a consultant to not make those kind of mistakes, whereas when you’re a full time employee, certainly your job is on the line. Yeah, you know, if you make a mistake like that. But the reality is you’re likely not to get fired as a result of it, of an issue like that.

00:45:55:14 – 00:46:02:04
Tara Kizer
As a consulting company, though, you know, your client may decide decide that we’re done with what this consulting company.

00:46:02:08 – 00:46:07:14
Mike Walsh
Or they called the insurance company or. Yeah that doesn’t happen with us. It was a minor thing there so.

00:46:08:02 – 00:46:12:02
Tara Kizer
I’m more cautious as a consultant that I am as a as a FTE.

00:46:12:15 – 00:46:23:24
Mike Walsh
Yeah, you have to be. Yeah. Yeah. Maybe we can have like July be consulting months. We talk about consulting that the things I’ve learned in consulting and you’ve all learned and things I’ve learned kind of running a consultancy because it is a like.

00:46:24:11 – 00:46:35:10
Tara Kizer
You’ll remember when we had a client who was telling me to truncate this, I had a two terabyte table and I remember I won’t be doing that. And he’s the CEO telling me to hit F5 on that command. I’m like, Nope.

00:46:36:04 – 00:46:57:10
Mike Walsh
Yes, it was. It was. I didn’t do it. Our customer CEO, straight path CEO. And I said, Tara, just do it. It’s fine. And she said, All right, but you told me to because definitely scary. Yeah. And well, and that’s, you know, as another blog post or a long time ago, call them a paranoid control freak. Have I got a job for you?

00:46:57:10 – 00:47:00:06
Mike Walsh
And in order to be a DBA, you have to be a paranoid control freak.

00:47:00:06 – 00:47:18:12
Tara Kizer
And especially you get to the senior senior level. I realize that, you know, you don’t want juniors in there needs to be making mistakes. But honestly, I mean, I’m not kidding. A lot of my knowledge is from mistakes or some kind of production issue. And then you’re learning as you’re going and, you know, as you know, we have people that are non seniors at our company.

00:47:18:12 – 00:47:27:04
Tara Kizer
We don’t want them making mistakes, but that is where a lot of us learned. And you’re learning on the fly at times and you just become a better person.

00:47:27:23 – 00:47:29:06
Mike Clark
Mark Twain has got a.

00:47:29:06 – 00:47:36:12
Mike Clark
Quote very similar to that. It goes something like Good decisions come from experiences, experience comes from making bad decisions.

00:47:36:12 – 00:47:39:04
Mike Clark
Yeah, exactly. Although a bit like that, right?

00:47:39:05 – 00:47:39:14
Mike Clark
It’s like.

00:47:39:22 – 00:47:45:09
Mike Walsh
Yeah, that is a paradox of being consultants, you know, because you don’t have runway for mistakes.

00:47:45:09 – 00:48:05:22
Tara Kizer
I have this, I have this a blog post out there that talks about my big mistakes in production. And the one big mistake that’s related to our discussion here because it pertains to log shipping. So at that company where I did feel I can mention the company because it had it’s not a client. This is I worked for Qualcomm for 12 years and we did disaster recovery failovers twice a year.

00:48:06:08 – 00:48:29:11
Tara Kizer
And you know, after that testing is done, you have to set up log shipping in the other direction. So, you know, as part of that, you are, you know, grabbing your full backups, restoring and well, I restored over the new data. You know, the Las Vegas D.R. is typically a D.R. site, but now it’s primary and just so used to the Las Vegas site being the disaster recovery server that I overactive production database.

00:48:29:11 – 00:48:48:10
Tara Kizer
And unfortunately, that system did not have a lot of activity. So, you know, normally if you have a system where you are about to make a mistake entering a restore over production database, usually there’s connections that will prevent you from doing that restore. And, you know, thank goodness for that. Usually, unfortunately, this was a system that did not have any connections when I went to do the Restore.

00:48:48:10 – 00:49:05:23
Tara Kizer
So it ended up not being a huge deal. But this is a you know, this is something that you learn from. And if I’m doing a restore in production or dropping a database in production, you can bet I am spending probably a minute staring at the screen, double checking I’m on the right server, checking my connection, maybe even doing select @@ server name.

00:49:05:23 – 00:49:30:00
Tara Kizer
I am like triple, you know, 100 times checking. I am going to do the right thing here In a lot of our systems, you we have access to prod and test through the same management studio window, you know, or at least you know and so you could make a big mistake like that. So if you think you’re in a test environment, dropping a database, you know, spend a little bit extra time to check.

00:49:30:21 – 00:49:51:15
Mike Walsh
If you’re if you watch that video I shared, I admit to my very early career days before as a DBA of working up the night audit guy at a hotel, and I used to be a grill chef and dishwasher, and I had the keys to the walk in refrigerator. And there’s no you don’t get to include a meal overnight because everything shut down.

00:49:51:15 – 00:50:12:00
Mike Walsh
But if you know your way around the rock and refrigerator, you can get whatever you want, but you never want to lock the padlock in the walk in refrigerator with the keys inside the walk in refrigerator. So I now will never lock a door unless I see the keys. Look away, see the keys, look away, see the keys, and then lock the padlock.

00:50:12:00 – 00:50:30:15
Mike Walsh
Because now I know I have my keys and I’ll hold my key while locking it just to remind myself it’s there. Because that was an awkward conversation the next morning asking somebody to let me into the walk in refrigeration. I get the keys back. Why was it in there anyway? Yeah. So. So thank you all for coming. We will go into our office hours in Q&A.

00:50:30:15 – 00:50:48:03
Mike Walsh
Time now and stop the recording. If you have any questions or a SQL Server related anything at all, even consulting related if you want hear about more mistakes or if you one of our clients and you feel horrible about the mistakes, we just admitted to you that then you can also ask questions in that and we’ll stop recording now.

00:50:48:10 – 00:50:56:13
Mike Walsh
Thank you all. Thanks for the panelists for being here and we will see what folks have have to ask.

Best Practices, data recovery, HADR, high availability

Article by Mike Walsh

Mike loves mentoring clients on the right Systems or High Availability architectures because he enjoys those lightbulb moments and loves watching the right design and setup come together for a client. He loves the architecture talks about the cloud - and he's enjoying building a Managed SQL Server DBA practice that is growing while maintaining values and culture. He started Straight Path in 2010 when he decided that after over a decade working with SQL Server in various roles, it was time to try and take his experience, passion, and knowledge to help clients of all shapes and sizes. Mike is a husband, and father to four great children and lives in the middle of nowhere NH.

Sign Up for Updates

HA/DR Panel Discussion

Transcript

Subscribe for Updates

Leave a Comment Cancel reply