Just uncovered an already reported and fixed (in SQL Server 2008) bug with Analysis Services job steps reporting success even if your MDX or XMLA in the step failed.

The troubleshooting that worked up to the issue also reiterates points I made when discussing troubleshooting methodologies and empirical evidence.

The Problem

Some changes were made by developers to a cube in a development environment. The goal is to improve processing and reporting time. They were ready to look at the changes in some reports so they kicked off the processing job. Looking at the reports it appeared as though several of the dimensions didn’t process. They processed manually and yeah, the data is there. They looked at the job status, it was a success. They had assumed that things were the same as production and test environments so they were at a loss.

Developer came to me knowing that I had worked on some update statistics jobs that weren’t yet created in this environment. Figured that maybe the statistics were out of date on the relational database source and that could be why the automatic process didn’t pick it up…

The Real Troubleshooting

Knowing that out of whack statistics, even if they were, should not affect a queries results, only the performance, I discounted that theory and suggest that we look at this from the start with the simple things first..

  1. I confirmed that the manual update did definitely update the data and the automatic update didn’t. I wanted to make sure I understood the problem.
  2. I said, alright let’s look at the job in a little more detail. We looked through the job to confirm a few things…
    1. The necessary job step is in fact there and the code is right, correct?
    2. The job really did succeed? Well there are no errors, let’s look at the output from the step.. Bingo.. The step reported success but the details very clearly reported error. A login permission problem.
  3. Fixed the permissions problem, re-ran and we were fine.
  4. Also noticed in the troubleshooting that the assumption made that Dev and other environments were close was not quite right. The jobs were very different.

The Analysis Services Step Problem

As reported on the Connect notes this is a bug, it was fixed in SQL Server 2008. Basically, if you have an MDX/XMLA step in SQL Server Agent as an Analysis Services Command the step will report success even if there was a failure in the processing of the job.

Some workarounds suggested in the forums were to use ASCMD which shipped with SP2 for SQL Server 2005 or use SSIS to do the steps which will trap and handle the error.

We are looking to move into SSIS as a project already so I will be working with ASCMD to fix this issue for the time being. This can be found in the samples directory of your installation. See this BOL article about using it: http://msdn.microsoft.com/en-us/library/ms365187(SQL.90).aspx. This is installed with the SP2 samples found at codeplex.

The Reminders From The Experience

Troubleshooting should not be a tough experience for anyone. A few reminders crept up during the discussion of today’s problem:

  • Don’t add complexity – Don’t go off looking for a magic problem and solution. Focus on the simple things first and dig through the problem in an ordered fashion, increasing complexity and scope as you go. Here when we backtracked and applied this approach the problem was solved in 5 minutes.
  • Are you really thoroughly checking your environment?? This one hit me between the eyes. I have been ignoring my own advice about doing as much proactive work as possible up front in this newish role. So I have not yet looked at all of the error handling (Availability, Reliability fails). When I found the bug my first reaction was, “I wonder how many nightly jobs have failed while reporting success?!” These are the things that should keep a DBA up at night. I will now be addressing a more thorough look through our jobs and forcing proactive time (or making proactive time.. otherwise what good am I doing?)
  • Connect is a great site – When you are bumping into strange problems and the forums don’t have much info for you, you should be considering contacting Microsoft support and you should check out the connect website. This is where folks report bugs, enhancements, issues and people at Microsoft do reply to some. You can also find other people reporting a similar bug and add comments, increase the number of reporters. This will help get it on someone’s list to fix if it is affecting more customers.
  • Assumptions – Everything you’ve heard about them is still true 🙂

[Update: 3/7/09] Tim Laqua [Blog] just posted a very helpful comment. He also just blogged about this same topic on his blog. He brings up a great point and a solution I failed to mention. This solution did occur to me (through one of the obscure search results when searching for something like “Analysis Services Job reported Success but job failed” and the suggestion of a coworker to just search the logs). The solutions I posted are more complex and require more frustration and testing (For instance the ASCMD.EXE is not just a readily available .EXE, you have to go through several steps to download and compile the .EXE and then create a command exec step and play with the simple syntax). His solution is to basically interrogate the msdb tables that handle job execution and look for your exception or error message.

Check out his post and look at that as a good stop gap until you upgrade to 2008. For where I am, I think the entire nightly process has many steps that should be converted to SSIS so we may change the longer term approach. For the short term I will not be doing ASCMD and will be having my monitoring setup look for the error text.

Tim’s comment was definitely a score, got a great point and just discovered a blog that I should be following. Check out his posts on SSAS, SSIS, SSRS,  Grammar (He caught me, I am an abuser of the ellipsis… like that) and heart rate monitor battery replacement.

Thanks Tim!

 

Share This