I want to commend the team at GitLab.com – they had an issue that took down a key database and everything else with it effectively and when they came back, there was data not coming back. This happens. This happens in “Traditional” 1.0 IT companies (ask my usual airline, Delta, they’ll tell you two stories one from this week and one from about 6 months ago..) Web 2.0 companies (ask any of them) and the new breed of companies that are a mixture and use all sorts of services and microservices with fun names. They worked out a plan and were incredibly transparent about their failure and their recovery and their next steps. I applaud and commend them for this. It’s a great model. Transparency is the right default always, but in 2017, it’s the only model, I think. They were very transparent. I scrolled through their recovery steps and notes today and a thought occurred to me. “Sometimes, you need more ops focus on the DevOps teams.”
What is DevOps?
Like any good consultant answer to any question, this is one I can truly answer with “It Depends” – there are a few descriptions out there. The basic premise is a fusion of developer engineers and operational engineers. Sometimes QA. It’s sort of an offshoot of agile development with agile deployments and operations. It can look different on different teams. It doesn’t mean “No operations” but it’s much more team oriented, with a flattened structure of developers and operations folks working together, often “operationally minded developers”. It’s continuous integration. One of my former clients, a large well known web property, used to be DevOps before DevOps had a name, to them it was summed up as a slogan they had everywhere “Speed Wins” – I grimaced and laughed a bit at that, but I understand the point. Let’s not spend our lives in analysis paralysis and Gantt charts. Let’s not worry about every single detail: let’s build software, get it to market, get it there quickly and improve as we go. My friend Brent Ozar recently reminded me of a Steve Jobs phrase I’ve often heard, “Artists Ship”
DevOps is good. It’s a fusion that is missing in traditional models the ops and the dev teams are far to split. They don’t get along. They don’t understand each others’ needs. Development often moves at the speed of NOW. Some developers would say that DBAs and infrastructure types move at the speed of NO.
So DevOps bridges this gap. You can read what Amazon Web Services thinks DevOps is here. You’ll note they start with “Speed” and “Rapid Delivery” as benefits of a DevOps model. Speed Wins.
What’s Wrong with DevOps?
Absolutely nothing. (Now a Seinfeld episode is stuck in my mind). Seriously, though, there isn’t a problem with DevOps, per se. A lot of benefits come out of DevOps teams slinging code and shipping. Always Be Shipping. I get it. I like it. This is a new, fast, online, in your face, transparent, understanding economy. Get stuff done. Even this blog post is already too long for 2017 (or 2009 when I started blogging for that matter). I’m bringing clients to the cloud in various flavors with entirely DevOps oriented teams. It’s a methodology with a lot of merits.
So I’m not railing against DevOps here. But for 17 years I’ve been a SQL Server DBA or consultant type, that experience means my personality has the tiniest curmudgeonly grumble here, though. I fear that sometimes there isn’t enough “Ops” in DevOps. I fear that sometimes it becomes a code word for “damn the torpedoes, full speed ahead” and is all dev, only dev. With ops being an afterthought.
Without a Solid Foundation – DevOps is DevOops
(Credit to my friend Tim Mitchell for using that DevOops word in a Facebook comment I just read).
What I mean is – if the focus is way too heavy on a bunch of really smart developers who want to Always Be Shipping, and checklists of features and cool things to show off and there is no solid Operations component? Mistakes will be made. Now I understand DevOps allows for mistakes – but downtime, lost data, bad press? Who wants that? I’ve yet to see a DevOps team that is extra rich on the Ops side, but I’ve seen plenty teams a bit too lean there.
Here’s the risk. From their very public release, here’s what GitLab experienced in their own words:
- ±6 hours of data loss
- 4613 regular projects, 74 forks, and 350 imports are lost (roughly); 5037 projects in total. Since Git repositories are NOT lost, we can recreate all of the projects whose user/group existed before the data loss, but we cannot restore any of these projects’ issues, etc.
- ±4979 (so ±5000) comments lost
- 707 users lost potentially, hard to tell for certain from the Kibana logs
- Webhooks created before Jan 31st 17:20 were restored, those created after this time are lost
I’m not 100% sure all that loss is permanent but they’ve been transparent and it is where they are as of now. As you can see from their post, a few things led up to this incident. Looks like some missing ops, quick troubleshooting in a hurry and a presumption of a working backup that wasn’t.
Here’s a truth – these things happen all.the.time. To large enterprises in that traditional model, to DevOps teams, etc. It happens. But it can be minimized some with a few steps, a few attitudes, a few of the philosophies from the “Always Say No (first)” traditional operational DBA types.
What Can DevOps Teams Do?
Look to your organization’s DevOps team and make sure that it isn’t 95% Dev and 5% ops. Like I said, I like DevOps, and believe in the philosophy – but you need to be balanced. You can’t eliminate the worry and healthy paranoia that comes from the Ops side.
Speed does win – in a highly competitive landscape, your investors, your CxOs, they all want to be out first. And if they aren’t, who cares if you lose data, it’s all for not. There’s fast, there’s faster and there is ludicrous speed. You very rarely need to be at ludicrous speed, though! And if you absolutely have to be there or else? It’s probably too late.
GitLab will be posting what happened here, but some folks lost code. They lost work. The hope is that there are local copies and the repositories are all there, but changes and updates are gone, users are gone. It could have happened to any provider out there. It happens to online backup vendors. It happens to healthcare systems. It’s probably happened to banks. I fear, though, it will happen more frequently where DevOps teams slide closer to Dev without Ops.
DevOps works because it is a blend of two key roles. It assumes Dev engineers but also assumes Operations engineers and attention being paid to details and future proofing. If it becomes a code word for “DevShips”, it’s not DevOps anymore. – Me, Today.
Three Specific Actions For DevOps Teams
- Plan to fail – One of my favorite blog posts to write. A two-part series. You need to get in the “we will fail” mindset and stop and think about what that means across roles and perspectives on your teams. Plan to fail so you don’t.
- Verify backups – all the time. every time. A backup you’ve never restored from is a crap shoot. Test your restores, understand your needs, make things break in staging or test and try and recover from production backups. In that blog post, I suggested that you should stop thinking about backups. Think about restores instead.
- Secure Your Environment (just a bit) – I understand a tenet of DevOps is getting things done – but sometimes we need to block the ability for our 11 PM selves to hurt our projects. Sure you at 2 PM will never run RM – rf but what about at 2 AM when you think you are on the replica or a staging server and are panicking a bit because things are down and all eyes are on you? If you give permission to do something, that something can happen. If you are in the habit of running RM -rf on production systems? You can blow away key components of your solution. Prevent it. Secure it. Setup some policies.
There are so many other tips. But these three would have made a difference here and they will for you. DevOps works because it is a blend of two key roles. It assumes Dev engineers but also assumes Operations engineers and attention being paid to details and future proofing. If it becomes a code word for “DevShips”, it’s not DevOps anymore.
You should also go and read Brent Ozar’s excellent post for some very specific SQL Server tie-ins and suggestions. Great post.