What is a troubleshooting methodology?
A lot of the technical problems I encounter with clients or in various roles are not really the fault of the technology at hand. They are either the result of rushed work, apathetic work or botched troubleshooting.
Before talking about what troubleshooting is, let’s talk about what it is not. It is not:
- Hail mary attempts at solving an issue hoping for that final connection
- Hitting your favorite search engine and trying everything you find without understanding it
- Trying something you knew worked in some completely unrelated situation
- Taking a hammer and crowbar into the server room
Far too often, I find some of the above approaches being attempted. Sometimes they even work to solve the issue but a lot of time is wasted, a lot of risk was taken and no one involved seems to understand the how or why behind the solution. Especially in the production database world, hacking is not a first resort and is only possibly even a true last resort.
When I am interviewing someone for a role (developer or DBA, it doesn’t matter) I like to ask questions that gauge one’s troubleshooting skills. I would happily take a person who is methodical, applies common sense and lacks extensive certifications or years of experience over someone who is experienced but lacks common sense and the ability to troubleshoot. You can teach someone the necessary skills but it’s tough to teach common sense.
So what are some common traits among true troubleshooters?
- Calm under pressure… They understand the importance of a date but will not rush in a half-thought solution to look busy.
- Uses available tools wisely… Nothing wrong with searching the web for an answer but is it the right answer? Is it vetted? Do you understand why it might work, what the risks are, how it works and how to undo it if it goes south quickly? If possible, did you test the process first?
- Knows when to ask for assistance… Your company probably pays for a support agreement. What’s wrong with using it? Especially if you can’t figure it out simply, don’t understand solutions and don’t feel comfortable with the issue. No one will think less of you if you ask for help and solve it. They will if you blow away a production database and revert back to an old backup.
- Looks for the simple first… If a printer isn’t printing what will you do first? You probably won’t go looking for a screwdriver and soldering iron. Well, apply that same logic when troubleshooting anything. Look for the simple solutions and increase scope and complexity as needed.
- Follows up when complete!!! If there is one step I often see missed, it’s this one. So the issue is resolved… Game on, right? Wrong. We all make fun of the “lessons learned” or “post mortem” process but it works. Understand why the issue happened, how it can be prevented, what solved it, what you learned about your disaster recovery strategy, what you learned about your response plan, where you can make immediate improvements, when you will make the longer term improvements.
These same troubleshooting skills can be applied everywhere in most any line of work. Understand what you are doing, learn from your mistakes and apply a consistent and calm methodology to escaping trouble wherever it hits you and vow to never repeat that category of mistake again.