This article is part of our SQL Server Upgrades Made Easy series:
- Why Upgrade?
- Before You Upgrade
- During Your Upgrade
- After Your Upgrade
- Recap Chat
Upgrading SQL Server – Week 3 “During Your Upgrade SQL Server”
You can watch week one here on the blog or directly on youtube. You can also check out week 2 on youtube or blog. Please consider subscribing to the channel as we’ll make it a rule that all webinar content from the free webinar series will be shared on youtube and sometimes here on the blog also.
This is the third video from the series from the webinar last week. Here we chat about the things to consider DURING your SQL Server upgrade. We discuss questions you should be asking yourself before any SQL upgrade project starts. We talk about rollback plans, CNAMEs, checking health before, etc.
Sign up for the to get the full cuts from Webinar Ninja with any Q&A each week including the replays of the past webinars. Subscribe to the youtube channel to be alerted of the live chat hangouts and to see the upcoming talks in our continuing webinar series. Towards the end of March, I’ll start with a multi-part series on High Availability/Disaster Recovery options for SQL Server. If you have questions, you may not need consulting, just some help! Check out the series, attend live and ask your questions or come to the Q&As. I’ll try and answer most questions left in the posts also. Yes, I own a consultancy, and if you need in-depth help, I think we can help – and I trust you can figure out how to reach us. But the point of this content and the webinar series is to help encourage you and show you how can you do more than you realize! See you on Thursday 2/28 for Week 4 – “After you upgrade”. Then on 3/7 we’ll do a live hangout on Youtube with the Straight Path technical team answering your questions.
Downloads from the Series
(The upgrade checklist download will come out in a separate blog post by itself next week. And after the Live Hangout on Youtube, we’ll take all the videos, all the content and make a single page with all the content)
- Download our free Self-Assessment health checklist – (this link is a direct download of the PDF).
- Our (Hot off the presses) SQL Server Installation and Configuration best practices checklist (this link is a direct download of the PDF)
(I’ve used a transcript service for this and I reviewed most of it. It’s a work in progress and there may still be some errors!)
So today, we’re going to talk about doing the upgrade. Today we’re going to talk about … we know what to do before, we know why. What are the steps we should do to actually have a successful upgrade? And what are some pitfalls we can avoid? I need to get over here because my slides are being controlled differently, apologize. Let’s see if I can do this right. Perfect, I can. So, again, this series, we’re really … Oops, yeah, I did it too fast.
We’re in week three, and here in week three … let me see if I can move my mouse right on top. There we go. Here in week three, we’re talking about doing the upgrade. So, some best practices for the new SQL install. How can you minimize downtime, right? How can we get the environment upgraded with having our users waiting nervously for less time? How can we do it with success? What are some ways to test to know we got all the data?
Next week we’ll talk about after the upgrade. Really, next week has nothing to do with upgrades. It does because we should be looking at these things after we upgrade, but if you don’t upgrade your SQL server and if you’re not planning on an upgrade, next week’s going to be packed with good content for you to just be a better DBA and to watch your environment better. So who am I? I’ve said this each week. My name is Mike. You can find out more about me on my blog, StraightPathSQL.com. I run a company called Straight Path. We do SQL server consulting, manage services, we help people upgrade projects. We do all sorts of SQL work.
I have a lot of upgrade scars. Being a consultant, we get to do a lot of upgrades, right? So when I was a full-time DBA, I could probably count on one or two hands the number of times I had to upgrade a SQL server. The only reason I have two hands is because I was a job hopper early on. I was always a consultant, I just didn’t realize it. Most people just don’t do a lot of upgrades, but we do a lot of them as consultants because we have a lot of different clients at different phases of the life cycle. So we’ve seen the problems, and we’ve gotten called into the upgrades that went, oops, or we’ve had the people who didn’t heed advice and tried to upgrade a path that maybe we don’t like, and got burned by it.
So where are we? Again, I already kind of gave the recap. These are the things we should know though. After the earlier two webinars, we should know why we want to upgrade. Obviously, that’s a pretty easy question, but we should know some of the reasons why we want to and some of the reasons why we can … the reasons that maybe our management wants to hear. So, hey dear leader, here’s why you should want me to upgrade. So, that little reverse psychology. Then we’re going to … We already talked about where we want to go, so we should have an idea of that. Again, you need to do the upgrade analysis with the upgrade advisor. We talked last week about the DEA tool, the database experiment analysis tool. We can talk about doing a SQL server database tools last week and building towards your target. The point is, we should know roughly where we want to be.
We should have an idea of what our environment looks like. We talked last week about running a health check and I have you guys … talked about link, and that link for the download is actually on the blog post from week two’s video already. We can run our health check PDF, or you can go and download the blit scripts, the Glenn Berrys scripts, or any scripts. I don’t care. That’s well trusted by the community, and see where you are. We should have a performance baseline, so we make sure that the new environment we build will have the right performance.
We should have the rough idea of which approaches we want to use to test. How do we know that we can support the new capability mode, because going to the new version is really cool, but it’s really important to also be able to work in the latest compatibility mode if we can so we can take advantage of the new features. We can use [inaudible 00:03:50] functions for example. Then we should have an idea … We talked last week about role back and sort of go/no go criteria. I said, it’s not a go go. It’s just go/no go. Most companies don’t have a no go. They just have a go and we’re going to go whatever you say. But we need to be the pesky annoying DBAs who say, yeah but, and explain the problems. So, that’s kind of there we should.
So where are we going to go? Well, today is going to be about how. So we’re going to zoom in on the how. How should we install SQL? We’re not going to go into in depth on that because that’s a whole webinar by itself, but we have a download we’ll have in the blog post about this for that. Like I said, I can’t give you the links to webinar ninja. The people watching the videos won’t be able to get them anyway, so we’re going to put them on the blog post. What’s going to happen is we’ll do a webinar on Thursday, and then by Tuesday of the next week … so three business days later, there’ll be a blog post with a video and a transcripts and links we talked about, and any downloads that we say we’ll make for you.
Normally this downloadable content, we’ll probably use a lead magnet on our website and people can come and give us their email address and all that. But for people coming to the webinar, I just want it to be a free thing for you, so you’ll just have a direct download link. Get it. I don’t want your email address. Just take it and use it. So we’ll have a checklist that we use to help guide us on SQL sort of best practices. That checklist alone could probably be a three part webinar series, but we’ll give you that.
How should you organize your own checklist for the upgrade? We’re going to talk about that today quite a lot. Then again, next week on the Tuesday, we’ll have a sample checklist that you can use and fill in. We’re going to talk about how to best execute an in place upgrade versus a migration based upgrade. How do we minimize downtime? How do we do a test ahead of time? How do we do a test after? These are the things we’re going to talk about this week.
So let’s first … I just want to talk about in place upgrades first, get this out of the way. I have a pretty simple in place upgrade process. My first step for in place upgrade is you have to attain the SQL server installation media. So go to MSDN, go to your volume licensing portal, wherever you get it. You can even get the old DVD. So this is SQL server 2012. At the [inaudible 00:06:03] they gave us a pretty cool sign book. The DVD’s not in there. It’s just an empty package with all the signatures of the program team.
It’s kind of neat. I keep it up here. It’s an old version of SQL and I don’t encourage you to use old versions of SQL, but I just like this little … I don’t know, it’s nice. It’s nice. I like nice stuff. It’s just touching. Mass produced, but still touching. So then you run set up. So you get your install media, you get it up there, you run set up executable, and then you stop. Seriously, you stop, throw it away. Don’t do an in place upgrade. If you were here hanging on the edge, just like oh this is great … I understand there are situations in life where you have no choice. You’re on a physical server. The management says, this is how we’re going to do our upgrade and we’re not buying a new server and we have no swing server, deal with it.
Yes, there are situations where an in place upgrade is your only choice. I’ve done them. I’ve helped people do them. They are so risky. I don’t like in place upgrades. As far as it’s up to you, put your foot down and say, we will never do an in place upgrade. Here’s the deal … if you’re on a SQL server 2008, when was it last built? If you happen to buy brand new SQL server hardware last year and you just moved your 10 year old SQL server 2008 to it, well yeah it’s new hardware. Maybe they don’t want to spend the money, and I understand that. But most of us are in virtual. Many of us are in the cloud. There’s other ways to do it.
We had one person ask about in place upgrade and we said, we don’t recommend it. Here’s all the reasons why, even though it’s simple. They went ahead and tried it and they said, nope don’t worry, we’re going to do it. We said okay, we’re support, you’re going to need it. 99%, maybe 95%, maybe 90, whatever, some percentage of the time. Not as good as 99.9, but some percentage of the time, they work flawlessly. When they go bad though, they go really bad. When they go bad in the middle of your upgrade, you have no rollback. Your rollback is to start mashing buttons to hope the old version works to try the uninstaller, and to do a bunch of stuff to your production SQL server that’s been your bread and butter database server for how many years. So I’m not of it. It’s pretty clear.
Again, they’re not that bad. They have to happen sometimes. If you want to see more, I’ll have a link for this. But if you just do a google search with DBA.stackexchange.com and you search for in place upgrade, you’ll find this answer I gave. It’s well uploaded. I have 91 votes on it. This is just one page. You’d have to press the page down button like four times. Anybody who’s in this webinar whose worked with me as an employer, as an employee, as a customer, you’ve sadly probably seen my long emails. I’m working on it, but some of my stack overflow answers can be long too.
My short answer is no, don’t do in place. If you have a question about in place upgrade, throw it in the comments and we can talk about it. We can talk about the live webinar, but we [inaudible 00:09:15] the Q and A in a couple weeks.
So the first thing we do is we’re going to install SQL server on the new. Hopefully you’ve actually done this in test already a couple times. We talked about this last week. You want to test, test your migration, test your compatibility mode, have your applications point to it, get a test version of your … Ideally, the ideal thing to do is for you, before you do your upgrade, to do an upgrade advisor, maybe to do the DEA tool, then maybe take SQL server database tools and run a build against it. Now that’s for your database. If it’s your vendor database, it might be as easy as talking to your vendor and ask them what do they support, and just cross your fingers, pray, meditate. Whatever you need to do, hope that your vendor is going to be compliant. Oh yeah, of course, we’ve already tested it and we certified it. You’re good it’s easy.
Sadly, a lot of our vendors just don’t play that game for us. Actually, I shouldn’t say a lot. Half do, half don’t. So it’s 50/50. But if it’s your own application, you want to actually run a build against it, against the target for the version you want to go and just see what breaks. But all of those tests alone are not as good as deploying SQL server X. So let’s say you’re on 2008 today and you want to go to 2017, upgrade to 2017 and test, deploy your database, get it all good there, and then point your test suite to it. Point your applications to it and do a full regression test and see what breaks. There is no better testing than a full regression test to see what breaks.
Once you’re done, build a new server. Build it right. So remember, you have an awesome opportunity here. You have a chance to take all the knowledge you’ve gained over the past 10 years and apply it to this new server. You can [inaudible 00:11:00], you can make sure power saving is enabled, you can finally have 64K drive allocation size when you format your drives. You can, if you’re on a VM, you can do the right data stores and para virtual scuzzy card connectors, and you can do all the best practices that you keep reading about in the SQL server community, but you say well I don’t want to touch it because it’s already here and we don’t want to break it. We want to wait for the next version. The next version is here.
You have no choice, by July, if you’re on SLQ server 2008 and you don’t want to spend a bunch of money on Microsoft and move to Azure and get their secret slug worth deal, or whatever his name is from Willy Wonka. But you have a chance now to build your SQL server to best practices. Run a health check on it, even though there’s nothing on it yet. Run a health check and see what settings you got wrong. Add maintenance. If you don’t have all the [inaudible 00:11:53] maintenance scripts or some trusted maintenance solution, install that in the new server and experiment with it. So when you migrate over there, you don’t have to say oh yeah … because here’s the deal, tomorrow is a big enemy of us. It’s an enemy of me. Tomorrow is the worst word in our career.
How many times have migration happened, and it finally gets done, and something goes wrong, a little bit minor but it’s not big. You get it solved at 1:00 in the morning and you say, alright, in the morning I’ll fix the backups. In the morning I’ll fix the index maintenance. In the morning I’ll fix this. Why do that? Set up your maintenance the right way today and have it scheduled automatically. It’ll be really quick right now when no user database is on it for the two weeks you’re waiting to move it migrate, but set it up head of time and then it’s just good to go. Set it up automatic is good. Setting up ahead of time is good.
Don’t recreate security woes. Does the whole entire world have SA access to your SQL server? If they do, fix it. Now, there’s two schools of thought there, I get that. Some people will say, well yeah, but we’re doing a migration. We want to break as few things as possible. I get that, sure. But if you can start over and have the best practices in mind, do that. Again, if we have a test first mentality, we’ll test this and we’ll see what breaks. Does your vendor really need SA access to your consolidated SQL server? Probably not. Will your compliance people like to know they do? Probably … Not even probably. They don’t, period, end of story. They don’t even like that you have SA access.
Again, you want to pick the right resource. So we talked last week about grabbing some perf data and doing the baseline analysis. If we do the baseline analysis and we know what we need, and we know what we’re using, we can actually build the new server to spec, and build it with the right amount of resources. Yeah, so Mark just mentioned a great comment. Mark says, “I have nothing but praise for Ola Hellengren, but make sure that’s his default are appropriate in your environment.” Yeah, absolutely.
We just had a conversation the other day with somebody on my team about this. Not somebody on my team, but somebody off my team. The defaults are bad. If you just take the default in the next maintenance job and run it, you’re doing the next maintenance. If you schedule it every night, like you just want to update your status, every night you’re doing an index rebuild and you’re not even updating your stats. So don’t just … True of any script and run it. You want to actually go to Ola.Hellengren.com, or whatever his website is. We just do a google search for Ola SQL and you’ll find him, but you can say Ola Hallengren, H-A double E-E-N-G-R-E-N and SQL, and you’ll find his blog, or his maintenance [inaudible 00:14:39] website really.
If you just click in any of the links for index optimize or backup, you’ll see pages of options. He even gives examples, so there’s really no excuse to run the defaults. Do not do it. Thanks Mark, that was a great point. Don’t forget to enable the maintenance clean up. So you have a new server. Keep your MSDB happy from the beginning. Do your MSDB cleanup history right away. Do your job agent history cleanup right away. Keep these things pruned and well oiled from the beginning. Recycle your air log on a regular basis. We’ll talk next week about some best practices in the after and kind of talk about some of the things that we find that health checks commonly, and getting your monitoring set up again. We’ll talk more about that next week.
Here’s a big thing. I’m a big fan of CNAMEs and aliases and not connecting to server names. If I have 10 applications on a consolidated SQL server, I might be as crazy as asking IT for 10 CNAMEs. Now you don’t have to go that far if you don’t want to, but I don’t like people connecting to the SQL by its actual host name. I want them to use a CNAME always. There are schools of thought. Is it silly to have CNAME if it’s going to an AG listener, or if you have a cluster incidence, because that’s kind of a C name going to a CNAME. Yeah, maybe, but I like them still. This is because I’m thinking my future upgrade, right?
SQL server is now coming out every couple years. Their servicing model is different. Staring 2019, there’ll be no more service packs. We’ll just have CUs and slip stream CU updates. So SQL updates come fast and furious now. You don’t have to take the train every time it comes by your stop, but you shouldn’t be four trains behind, like many people are today, or five. So, if we’re going to start upgrading more frequently, we want to make our upgrades easier. If we take the pain today and make a CNAME and migrate to it, that does two things for us. We’ll talk in a second when I talk about operation honey pot and how that can help us, but it also makes future migrations easier.
So you’re making today’s self mad. This is stupid. But in two years, your future self will say thank you. I could sit here for hours and talk about all the ways that old Mike screwed over today’s Mike and I hate it. Right here at the farm, we’re running out of our fire wood. We probably burned five cords this year. It’s been so cold a couple times and we’re not using oil. I cut my wood way too late, so it’s way too green and I’m probably going to buy a cord of wood this weekend and it’s going to be way too expensive, but I’m stubborn and I don’t want to burn oil.
I’m mad at the Mike who was sitting on the couch last spring or two springs ago instead of cutting down woods here at the farm. Take the time today to make your future self like you. It will be amazing. So make the CNAME. When we talk about operation honey pot in just a few slides, it will give you another tool to find those non-compliant, frustrating developers and vendors. So we’re not going to read all the slides here. You can get the slides on the blog post when I share it next Tuesday, but it starts with a plan, right?
So you’ve got to have an upgrade checklist. I love checklists. I don’t use checklists nearly as much as I should. I’m on the fire department. We have some checklists there for truck checks, and we have checklists in the ambulance for certain cardiac emergencies. Checklists are a really important tool. Like I said last week, if they’re good enough for surgeons, if they’re good enough for pilots, even big restaurants, chefs use them. If they’re good enough for all these other industries, I think they’re okay for us to use too.
So all these are the kind of things that go on a checklist. Again, I’m not going to talk about these, but think about the things that we want to do before. So there’s kind of three components to a checklist, just like the three components in my webinar. There’s before, there’s during, and there’s after. There’s kind of flavors of before and flavors of during. There’s kind of during the night of, and there’s during the week before. But for the before side of the checklist, these are the things. What do you use? This burns people so much.
Do the full inventory and understand what you use. Understand what tools and components you’re using. Understand if you’re using encryption. You don’t want to get to cut overnight and say oh I forgot about this. We talked last week, DEA tools at IO is really great. It doesn’t have all the answers for you, but if you look at all the power cell commands that are there, you look at that and, if you kind of use that as almost interrogate yourself … I’m going to actually mention that in the checklist that we give next Tuesday. If you use that as a Q and A for yourself, almost interrogate yourself and look at all the commands and DEA tools and say, do I use that, do I use that, do I use that, and check. Do I use that, and check.
You’ll do kind of three things. Number one, you’ll get annoyed. Number two, you’ll see what you use and you’ll know what features are in use. Number three, you might learn a few things. You might take an extra 20 minutes to have to look at your server and say, am I using this? What is this? If it’s a what is this question, hopefully you’re not using it. If you are using it, then you better learn about it and you better start looking at it. But if you’re not using it, file it away and just ask yourself what is that. This could be … Your upgrade can be a way for your, not just to make your environment better, make your users happier, make your management happier, save you money and all these other peace and harmony things, but it can be a way for you to learn more about what’s in SQL server, what’s changed since we put the SQL server 2008 cover or layer of dust in place.
Get your phone numbers, your phone trees, your vendor support numbers. I’m not going to read through all this, but one thing that’s really important I think is … again, I have a whole webinar or whole presentation just about … a professional development talk about learning from aviation disasters. One of the things that gets engineers in so much trouble and has gotten a lot of pilots in trouble and their passengers in trouble sadly in plane crashes is you can get stuck trouble shooting.
When I was a kid … I just turned 40, so I’m not old, not young. I’m kind of in the middle. I was a big fan of this game called Civilization. It was like Civ One and I could start playing that game as a kid, on my old computer, at night. After school, 2:00, I maybe grab my dinner. [inaudible 00:21:18] mom was working overnight shift as a nurse. Exactly Josh, just one more turn. I’m just going to have one more turn. Finally, you get done and you get up to go to the bathroom or stand up and you look around and the sun’s coming out. You’re like, what just happened. So it’s easy to get stuck. When you’re fixing a problem, it’s so easy for that part of our mind, that engineer part of our mind, to sit here and say I can fix this. I’m going to fix this, and I’m not going to stop until I fix this.
Well, many airplanes, especially before crew resource management, are in pieces in the ground in warehouses after accidents because of that, because people get stuck troubleshooting. There was nobody to help pop the stack, if you will, and say whoa, we’ve spent too much time troubleshooting, we have to stop. It’s a big thing for us in the ambulance too. We can spend a lot of time on scene, packaging a patient after a car accident just so right. This bone … My camera is low. This arm has to be set just right because it was bent a little bit. This finger is hurt, and this is hurt, and that’s hurt. Meanwhile, they have a punctured lung and they have gross bleeding, and their pulse is weak and thready because they have a head injury. You’re doing all this time fixing the things that you have control over while ignoring the things you don’t have control over, and nobody’s made that decision to exit and drive to the hospital, or exit and fly the plane.
In an upgrade case, we need to identify ahead of time how much time to we have. Who’s going to be our time keeper? In a plane, who’s going to be our watch the sky keeper and our watch the fuel keeper, and fly the plane? Somebody during an upgrade has to be the project manager. I know us techys have this knee jerk reaction against project managers, rightfully … not rightfully so, I’m just kidding. Maybe. But we need to have a project manager of the upgrade, so who’s going to do that, and where are the rules, and where are we going to say stop, we’re rolling back? By the way, go back a few slides when I teased you with my in place upgrade tips, if you have an in place upgrade, you have to have less time. You can’t have as much time to troubleshoot because your rollback is so easy if you’re doing a CNAME change new server.
You make a new server. You take all your database offline and we’ll talk about that in a second. You migrate to it. You point your connection strings to it, or you just change your CNAME, whatever you’re doing. If you have a problem, you say oh this is too bad, let’s troubleshoot it. Finally, once the timekeeper says, hey listen, you have 30 minutes left. We have to either roll back in 30 minutes or say we’re good, so I need to have an answer. Once that person says that, you can go right up to 29 and a half minutes fixing, yeah I got it, I got it at the last second. Or you can roll back, because all you have to do is turn off, turn back on. As long as you haven’t given access to anybody, you can just roll back. Change your connections strings back, bring your databases back online in the old server, everything’s good.
With an in place upgrade, yeah, you have to have that happen an hour, an hour and a half ago so you can uninstall and fix what the installer broke, maybe restore, blah, blah, blah. But you need to have that in your checklist ahead of time. At every phase, where are we going to stop, who’s going to be our timekeeper, who’s going to watch this, because this is so important. Scope creep happens. So during … and again, I say during. It’s really, kind of like I said, two phases of during. There’s sort of the during after we’ve determined we’re going to upgrade.
So before, to me, is we need to upgrade, where are we going, how are we going to do it, why, what version, will it work. During, to me, is kind of the sort of like, there’s during one, which is we’re going to build a new server. There’s during two, which is we’re cutting over. So I consider all that lumped into during. Maybe on my checklist, if I tweak it because it’s still a work in progress. If I tweak it once more, I might have a second name for that phase. If you have a name for that phase, put it in the chat and I’ll use it. But we’re going to provision new server. We’re going to stall some of the best practices. We’re going to run a health check on it. So do the health check and save that health check.
Imagine how that looks to management and to your auditors that you have this run book of what your environment looked like and how it checked on your health check. Do your road counts the night of go live. We’ll talk about that in a second. Right before and right after, so you can then have that document, even if nobody’s asked for it. Just do it. Be the nerdy, geeky person that has all this documentation filed away in nice little folders on your computer, and don’t wait for them to ask you for it. Share it. Say hey, we did the upgrade. Here’s what the configuration looked like before. Here’s all the features we’re using. Here’s our checklist of features that we’re using after. Here’s all the database names. Here’s all the road counts. Here’s our health check. Here’s our maintenance schedule. Here you go. I’m taking next Wednesday off. Because you have to be there Monday and Tuesday to watch things, but I’m taking Wednesday off. Please pat me on my back.
It’s good for your resume and it’s not just the reason to do it. It’s really good because it shows that you care. As DBAs … I say database advocate, is what it stands for. Some people say don’t bother asking and all those jokes, but we are the … maybe this is wrong again. Maybe some other people, other roles will yell at me, but I think the DBA, or the person who has to be the DBA … so whether accidental or on purpose, but the DBA is the only person in the company in my mind who cares about the data.
Now I know that’s a stretch. I know, but work with me here. That should be the attitude we have. In many companies, when you walk through some of the developers and SIS admins and management and vendors do, it might be true. I don’t know. So, that’s how that works. Again, we talked about the CNAME already. We’re going to go a little bit longer, so if you have to jump off at 12:30. I said 30, 40 minutes, so we’re at 12:30 now. We have probably another 10 or 15. If you have to drop, this is recorded, but if you want to stay and ask questions, stay and ask questions.
So then, for your plan, we have to have the after as well. What are we going to do after? Remember Ron Popeil’s rotisserie chicken? If you remember civilization, you probably remember infomercials. If you remember infomercials, you probably remember Ron Popeil. If you remember him, you probably remember the showtime rotisserie chicken. Anybody remember, and type in the chat if you do. There’s a lot of you here, but you’re shy. Set it and forget it. Exactly John. John’s not shy. Set it and forget it. That is exactly what people think SQL server is. You just press next a bunch of times. Then you press finish once. Then it’s good to go.
That’s not true. You have to monitor it. You have to do maintenance. We talked about maintenance a little bit ago in Mark’s comment about the OLO scripts. We should do a daily review, whether we have a tool doing that, whether we have a junior DBA doing that, whether we have a power cell tool like DBA checks that’s telling us each day. We have to know each day that we’re good because entropy happens. You can build a server today and walk away and you will have problems eventually. It might take a year. It might take ten years for some people. We get phone calls all the time from people and the conversation goes like this. Hey, we have SQL server and I don’t really know what’s going on. It’s been working fine for seven years and it just started getting slower. Well that happens all the time. Set up alerts.
So you built a security … you fixed security on the way over to this new environment and you built a new security model. Well keep your security posture. I say no. That should be our favorite word as a DBA. Then I had a blog post this week, and I just want to have a quick little offline thought. This wasn’t in the deck originally, but this came in. There’s a blog post from me on, I think Wednesday or Tuesday this week, and I talk about an email hack that happened at VF email. It’s really sad what happened. A hacker came in. They sort of … The got into their credential store and they got into their password management. Somehow they managed to get all credentials to all different environments all online and they deleted all their backups and all their systems. They had no offline backups and they just lost three years of data.
They happened to have data from 2016 because they had an old backup for some other purpose. Not for security or for management, but they had the backup just sitting around just in case. I’ll give you a link here. I think I can grab it in my post, but I’ll but it in the link if I can. Here it goes. Here’s the deal. I got a little nervous about this because a lot of companies nowadays are saying, I back up to the cloud. We’re backing up to a blob storage or we’re backing up to some sort of online tape library, and it’s in two locations. [inaudible 00:30:06]. All these backup approaches, even doing multi-cloud like we’re in Azure S3 … I’m sorry, blob, and we’re in AWS S3. That’s a great backup strategy. That’s a really, really, really fault tolerant backup strategy.
You have multi-cloud protected. If your data center dies, you’re okay. The one thing I just didn’t really think of when I look at these solutions, and until this hack, is that’s great. You’re not going to have any problems from data loss. You’re not going to have any problems from corruption, as long as your doing check DBs and all that. You’re not going to have any problem created by a bad system. You’ll lose … Azure had a lot of business in all their data centers all across the country could blow up and you’re fine. Still a mystery, but it’s online and, if somebody can get into online, no matter how many … you could have 75 thousand layers of protection. So you have all these layers of protection, but there’s still one point of failure in your security schemes. There’s always going to be one point of failure in your security schemes, and it’s really you and me.
So offline backups need to fit into your model. If you don’t do offline backups today, look at that post and start doing offline backups. I noticed in my link … that’s great. As I see it in the chat, I said destroys instead of destroys. I can’t fix it because it’s already live. The URLs wrong, but it’s the right URL. Go ahead and go on that. Anyway, make sure you think of these things. Don’t just say I did my upgrade and I’m good to go. Yeah, that’s a good point Andy. Andy reminds me I can do redirects. I should have fixed that. I’ll fix it and redirect. Thank you.
So, that’s that. Now, what about a minimal downtime cutover? I keep talking about these things. So it’s log shipping, right? Or log shipping. I don’t care if you actually use log shipping. In fact, for a lot of our small, easy, simple migration upgrades, we don’t actually use log shipping. We have a couple clients where we are taking like 500 databases from one data center to another, or going up to AWS or Azure, Iaz. In that case, we’ll actually build some table driven log shipping. I don’t do that. Jack and my team … I’m a good DBA. I’m not a good table driving kind of a guy anymore. I’m getting too old for that. I don’t know. Maybe I’m not too old, but I’m not as good as my people. They’re a lot better.
It doesn’t have to be official log shipping, but it’s the same principle as log shipping. So just one example. Now this example could be whatever, but let’s say on cutover minus three days. So let’s say we’re going live on Saturday. So Friday, Thursday, Wednesday. So Wednesday maybe, if it’s big environment. If it’s small, we can do this all Friday night. But we’re going to cutover minus something. We’re going to take a full database back up of your source databases, and then stop your full back up job because we don’t want to have a full backup job going and mess up our differential chain or anything like that.
So take a full back up, stop your full backups, but keep your diffs running. A lot of people sometimes get confused. They think we’re migrating, we don’t need to worry. You still need to worry because your environments not fully caught up yet. On maybe a day before cutover, take that day’s differential. For large databases the day before it makes sense. Maybe for small databases, you can just do it on cutover day a few hours before, but take a differential, store it with no recovery. On cutover day, maybe take that final diff if you want. You don’t need to. If you didn’t do it before, do it now. Restore with no recovery. Then start taking log backups.
If it’s a simple environment with not a lot of change, maybe you’ll change your log back ups. If it’s a hourly or half an hour if you’re doing it manually, but you can script it. You can have log backups running every minute and just script the restore, so it’s not going to be hard on you. Then, at cutover time, confirm it’s down, wait, confirm it’s down, wait, confirm it’s down. So I used to work at a hotel, and I worked in the kitchen for a long time. In the walk-in, there’s stuff that we would eat in the kitchen. It was okay to eat in the kitchen because we’re kitchen employees and we could eat this stuff. But those front desk people couldn’t eat it.
Well, I upgraded myself, or downgraded depending on how you look at it, to the front desk. That’s where I met my wife. Long story, but that’s how I met my wife, so it was actually an upgrade. I was her manager, her assistant manager. I asked to be put on her schedule when I first saw her come in. Anyway, I’m probably turning red. When I was working as a front desk guy, we had an overnight auditor quit. I was the assistant manager, so I had an overnight auditor for awhile, because when you’re in charge, you do the jobs that nobody else is doing, or you should anyway, and you should do all the jobs by the way. That’s a whole other topic of leadership and I’ll not deviate.
I would go into the kitchen and get some of the food that the kitchen people are allowed. So I wouldn’t go and take lobster or the roast beef for the special tomorrow, but I’d take the food the kitchen people are allowed to kind of munch on throughout the day. One time … and it’s padlocked. The walk-in refrigerator is padlocked, but on our key chain we had all the keys. Anyway, one time I left the padlock in there, or I left the keys in there on the rack and I padlocked the door. That was an interesting morning. I didn’t get in trouble because it was one of those things. The bar tender came in the next morning to do his inventory. He let me in. But ever since then, I literally don’t have the keys here to show you, but whenever I lock my door in my car, whenever I lock the door in the house, even though I have to use my key to do it, I still look at my hand. I look to make sure I have the key and I say, do I have the key, I have the key. Do I really have the key? I really have the key. Now I’ll lock it.
I don’t know why I do that, but ever since that moment when I was like 17, 19 years old, whatever I was, I do that. Do that with your cutovers. Am I really down? Make your web people hate you. Are we sure? How do I know you’re sure? Okay, I’m going to turn off SQL server and turn it back on again and see if I have any connections come in. I do that sometimes because I’m paranoid. I don’t want to have to deal with 10 transactions or 4000 if you’re a really busy environment coming in and that quick moment because my last log back up and turning the databases offline. I want to make sure that we’ve shut down as much as we can. Changing network names … if you’re doing CNAMES, well just have your network team change the CNAME.
So this is another reason to use a CNAME because you can just have them turn the CNAME off. Point the CNAME to null, or whatever they can do in the networking side, but make it so the CNAME does not come to the SQL server. Be double sure, and then take your count. Take your log backup, take your count, take your database offline. I like to log backup first because, if somebody came in before my count, I want it to be log backup, last log back up. So all that double checks I talked about. Last log backup, count offline. I don’t turn my server offline typically. Now there’s caviates. I like to keep my server online. We’ll talk about why in a second with operation honeypot.
If I keep the server online, I can go there and say oh crap this isn’t working. What log in was changed or what happened. I can go and do those things afterwards. By the way, what’s not here is all the other stuff. You have to use DBS tools, like we talked about last week, to move your log ins, move your jobs. All those things you should be moving ahead of time, and either tightly controlling … Hopefully you’re in an environment you don’t have just random people creating log ins. If you do, that’s a whole other problem and we’ll have to have a security series of webinars. But hopefully you know what’s happening, and it’s a tightly locked down environment.
You go into lock down mode a week or two before. Code freeze, environment freeze, config freeze, we’re not changing anything. Then you can do a lot of your config ahead of time, but makes sure you have it. Do some due diligence, but keep the old server around because things happen. We’ll talk about that in a couple slides, but I think things happen. So confirm the role counts and then do the reverse. Restore the last final recovery, confirm the role counts again, make sure they’re good. If they’re good, now go forward the rest of your steps. There’s other options to other approaches. There’s database snapshots. There’s VM snapshots.
You’re also, I won’t say one thing, is detach and attach. I hate detach for the same reason I look at my keys two times when I lock something. I hate detach for the same reason I use copy instead of cut. Detach is final. Once you detach a database with SQL, you can accidentally do something stupid to that file. I’ve built the whole entire resume of doing stupid stuff, especially a late night, and especially with everybody watching, and especially being busy. So you want to protect yourself from yourself. So don’t do detach and attach. Do backups. Then, by doing backups, you can do this in five minutes of downtime. That’s pretty cool.
A lot of people say, I don’t want that because I can’t be down all weekend. I say, what? All weekend? Get out of here. Even for a 15 thousand terabit database, that’d be whatever. You don’t have to be down all weekend. You can be down for a real short amount of time. Maybe 15 minutes if you’re really busy transactioning and transaction [inaudible 00:39:13] are huge every five seconds. But for most of us normal people, we can do this with fairly low downtime. So we’re live. We’re live, but not them. So when you’re live, don’t make it live for your users yet. Make it live for your smoke testers. Your smoke testers are in your checklist. You have their information. They’ve been told to be here ahead of time. They’re ready to go.
If you’re allowed to and if you can afford to downtime, have a window. You’ve got a five, ten minute script. Just run these automated scripts, run these tests. Make sure it’s good. Makes sure that the tests don’t change data. You don’t want to change the live production data because you have to role back. Now you have to deal with that. How do we deal with that? So the goal here is to say you’ve already tested three weeks ago. Does everything work the way it did in test, because it doesn’t always work the same in production. Everybody here, especially my developer friends, you know it worked fine in dev. It’s the same that’s perfectly valid. It’s actually perfectly invalid, but it’s perfectly valid. It did work fine in dev.
Now you’re into production and something was a little bit different. Production is different sometimes, and you’ve not yet tested in production. You’ve tested on the new server. You’ve tested on the new database on the new server. You’ve actually done a very comparable test and we’ll talk about that in one of the blog posts. You’ve really done a production test if you’re do it on migration, but you’ve not tested the real web server. Or maybe you have and you have multiple, but you’ve not done the full test. You’re doing your first full test right now. Then have a sign off. Don’t just have the sign off be you. Have the team sign off. Get management to sign off, and have decisions for rollback and not rollback. Once you’re done from that point, and when you’re ready to pull the big switch and push people online and do your redirects, and set up your CNAMEs and all that, it’s done. Your rollback is not as easy as rolling back anymore. You have to now do backups and recourse.
It’s still easier than an in place because you have the old server kicking around. That old server, by the way, is kicking around for what I call operation honeypot. Keep that old server up. Keep it up for as long as you can license wise and server wise. Keep it up for a couple weeks anyway. If you took my advice and you’re a CNAME changing fool like I am, and you use this as a chance to change your CNAME, what’s going to happen now is anybody who’s connecting to prod SQL 08 for your 2008 prod SQL, when they connect to that server and they try to log into their database, their databases are offline because you’ve left the databases offline.
Now, in the air log, you have a memorial of somebody connecting to your old server. Now you can go and find those old stragglers because there will be stragglers. I can probably count on one or two hands time I have not seen little oops afterwards. They happen because something changed, somebody didn’t explain something. Even with all the due diligence, there’s a couple things that just didn’t get caught. So catch them and look for them. Now you’re going to do good.
Not next week as in next week’s webinar, but the next week after you upgrade, be paranoid. Don’t just go yes, yes, and then go home and say I’m taking a week off, I upgraded to the SQL server beast, we’re done, do not call me. No, be there Monday, be there Tuesday, be there Wednesday. Watch how the users interact. Watch performance. We talked last week about legacy kernel estimation. I’m a fan lately of always using legacy kernel estimation mode on new databases that migrate up. That way I can take advantage of new compatibility mode, but let the queries run the way they would on the old CE. I don’t want to use the new CE until I’ve had time to re-performance test and regress all the environment. If you’re not doing that, I’d rather just go there.
If you didn’t want to do that, you better be watching, watching with century one, watching with a monitoring tool, watching with SP who is active, watching with DBA checks or one of those power cell modules. You need to be watching for issues, watching the query store. Watch for people having log in failures in the environment because some of the databases that they were expecting to be there isn’t’ there because you forgot a database even though you had a checklist. Don’t relax. Celebrate after … I say a good day, but celebrate after you have a good week of production. Don’t forget that month end will happen, and don’t forget the quarter end will happen, and year end will happen.
I’ve seen environments that upgrade to a new version, and a year later finally when year end runs, they say ohhh we forgot about that. So think about these things. I think … did I have something in here? I thought I had something about the server. I don’t. I’m going to say it now. Maybe I skipped over the slide. Don’t just think about SQL server, unless your in a platform as a service environment. I know I have a slide there for that. Whatever.
Unless you’re in a platform as a service option, like Amazon RDS or Azure SQL, there’s going to probably be something in that local server that’s yours. There’s going to be something on the C drive or the D drive, some config file, some local SSIS package that used to run from a file instead of a SSIS store. Andy [inaudible 00:44:23], he can correct me. You know what I’m trying to say, the SSIS catalog. I think he’ll tell me if I’m wrong in a second, but you’re going to have something out there. Maybe it’s a file share that’s being used. Maybe you write to some path locally or you write to some weird path, nothing too perfect yet. We’re getting older Andy, right?
You need to check, and you need to look and see what else is being used because that’s what you’re going to have weird problems with. So keep the old server around if you can so you can copy and replicate. They your recap and homework is really get the checklist. Again, the checklist is a work in progress. It’ll be there Tuesday in the blog post. Maybe Monday. I’ve been doing … I had the transcript service running over the weekend, or Friday night, so I might get the blog post up over the weekend to go live Monday.
But when that goes live there’ll be a couple downloads. One of the downloads will be our health checks. Our health cheeks are already in last week’s download. This week we’ll have the … On Tuesday, we’ll have the upgrade checklist and the server build checklist. So how Straighpath builds our SQL server, and then a sample upgrade step checklist. Play with DBA tools. Again, do a heath check with your server. Again, we talk about this is last week’s homework, so our PDF is there in last week’s post, but do a health check run before. Read my rant on stack overflow on DBA.stackexchange about the in place upgrades. Send me questions.
So next week we’re talking about after. Then, in two weeks we’re going to the whole team’s Straightpath. We’ll have a live Q and A on Google Hangouts. We’ll see how that works. Whoever wants to on the team. I’m not going to make everybody do it. On the Google Hangout, and in that Google Hangout, we’ll answer your questions. If you have a question now, send me an email or ask in the chat. I’ll stick around for questions for a few minutes. I won’t click stop, because apparently if I click stop you can’t question me anymore. I just won’t put that into the YouTube. So I’ll stop the YouTube right about here, so thanks for attending.
Get working on your plan. Send me a question. Any questions you want to have discussed about your situation, we’ll keep it anonymous in the chat next week, or two weeks from now really. We’re happy to do it. Look at the blog post for the downloads. Then again, next week we’re going to talk about the after. So what do we do after. Again, next week is really good if you’re upgrading or not. It’s really just DBA 101. I call it … people say adulting. It’s like DBAing. So oops, we’re DBAing instead of adulting. So anyway, thank you, thank you so much for coming to the webinar again.