A Simple SQL Server Health Checklist

A Simple SQL Server Health Checklist

Being part of a consulting team has given me experience with many environments, and part of the work I do every week is to check for the various best practices and monitor server health across over 20 environments. This post condenses the tools that can help and the configurations that are most likely to cause you trouble. The goal is to help you stop and take a look at your environment make your system a bit more aligned to industry best practices and improve overall instance health.

This checklist is aimed down the middle and is the starting point when we are looking at a new or old server for alignment to best practices. Each environment is different, and you should always work through your own change control and testing before making any changes in yours, as well as working with an expert to decide if these options are correct for you.

If you prefer to download a copy of this checklist, click on the link below:

1. Make sure the instance name and server name are aligned

This is to help reduce confusion on the administrative and configuration side.

2. Make sure instant file initialization is enabled

This is a boost to performance, this can be a security issue but is a net benefit.

3. Make sure as many jobs and databases are owned by sa as possible

If a user leaves the company or has their account disabled for any reason it can cause jobs to fail. For databases, we would recommend making the owner sa and giving the previous owner ‘db_owner’, if necessary.

4. Make TempTB have the number of files equal to the CPU up to 8, make sure they are sized the same.

This provides a small performance benefit

5. Check for server configurations

  1. Backup checksum default to 1 (this is for corruption verification)
  2. Backup compression default to 1 (this is to conserve disk space)
  3. Max degree of parallelism to ½ the number of CPUs up to 8 (this is to reduce the risk of one query using all of the server’s processing power)
  4. Cost threshold for parallelism to 50 (this is to increase the threshold for parallelism so that less simple queries are going parallel)
  5. Optimize for ad-hoc to 1 (this increases performance if you have any one-off queries being run with any frequency)
  6. Remote admin connections to 1 (this is helpful in case of SQL Server being locked up; more explanation here)
  7. Max memory (we recommend following Jonathan Kehayias’s recommendation for most environments)

6. Trace Flag 3226 enabled (and 1117 and 1118 for pre-2016 environments.)

3226 cleans up the error log by suppressing successful log backup messages. 1117 and 1118 are corruption-related and are built-in 2016 onward.

7. We recommend enabling alerts for severity 16-25 errors and errors 823-825

These are considered advanced errors and help with corruption detection and remediation. We recommend finding an expert immediately to help if you start getting these errors.

8. Maintenance (come up with a plan that works for your business, but this is our default.)

  1. Are full backups being taken for user databases weekly? +4+5 The full weekly, diff daily, and log every 15 minutes combined with Ola’s 2-week cleanup times provides an easy point in time recovery.
  2. Are full backups being taken for system databases daily? System databases can fail or be corrupted, having a backup makes an already painful system database restore a bit easier to manage because you won’t be starting from scratch.
  3. Are diff backups being taken for user databases in full recovery daily? See above.
  4. Are log backups being taken for user databases in full recovery every 15 minutes? See above.
  5. Are indexes being rebuilt and stats being updated weekly? Index and statistics maintenance ensures performance for existing indexes and data does not degrade over time as data is inserted and deleted.
  6. Are integrity checks being run weekly (for user and system databases)? Integrity checks are an important part of corruption detection and remediation. Combined with a strong backup and recovery plan this minimize the impact corruption could have.

9. Update tools (Dbatools, Ola, and Brent Ozar’s First Responder toolkit, whoisactive)

These tools are used across the industry. Updating them helps bring in new functionality.

10. Verify that database file autogrowth is not set to %

% autogrowth can cause performance and availability issues, especially with larger databases. We recommend a fixed autogrowth amount based on database size.

11. Make sure no databases have auto-shrink or auto-close enabled

Auto shrink can cause fragmentation problems, and auto close can also cause performance issues.

12. Make sure page verification is set to CHECKSUM for all databases.

Page-verification is a corruption checking tool, checksum is the best of the three options (none and torn-page being the other two).

13. Make sure the recovery target interval is set to 60 for 2016 onward.

The new industry standard for 2016 onwards is that the recovery target should be set to 60 seconds.

14. As much as possible, make sure C:\ is not being used and that data, log, tempdb, and backups are all going to different drives.

Seeing environments where everything is on the C:\ drive, or TempDB shares a disk with data or log and backups can be concerning. The adage of not putting all of your eggs into one basket applies here. Not to mention that performance would increase if these loads were separated.

Conclusion

This seems somewhat like a best practices list, and it is, sort of. Without these settings, your server’s ‘health’ can degrade. Consider it a preventative checkup! I have included a script you can run to check for most of these settings:

/*====================================================== SERVER CONFIGURATIONS ======================================================*/


/*~~~ change SQL Server name if computername changed ~~~*/
	SELECT  HOST_NAME() AS 'host_name()',
	@@servername AS 'ServerName\InstanceName',
	SERVERPROPERTY('servername') AS 'ServerName',
	SERVERPROPERTY('machinename') AS 'Windows_Name',
	SERVERPROPERTY('ComputerNamePhysicalNetBIOS') AS 'NetBIOS_Name',
	SERVERPROPERTY('instanceName') AS 'InstanceName',
	SERVERPROPERTY('IsClustered') AS 'IsClustered'

/*~~~ Instant File Initialization ~~~*/
  	IF RIGHT(@@version, LEN(@@version) - 3 - CHARINDEX(' ON ', @@VERSION)) NOT LIKE 'Windows%' 
		BEGIN 
			SELECT  SERVERPROPERTY('ServerName') AS [Server Name] , 
					RIGHT(@@version, 
						  LEN(@@version) - 3 - CHARINDEX(' ON ', @@VERSION)) AS [OS Info] , 
					LEFT(@@VERSION, CHARINDEX('-', @@VERSION) - 2) + ' ' 
					+ CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(300)) AS [SQL Server Version] , 
					'N/A' AS [service_account] , 
					'N/A' AS [instant_file_initialization_enabled] 
		END 
	ELSE 
		BEGIN 
			IF EXISTS ( SELECT  0 
						FROM    sys.all_objects AO 
								INNER JOIN sys.all_columns AC ON AC.object_id = AO.object_id 
						WHERE   AO.name LIKE '%dm_server_services%' 
								AND AC.name = 'instant_file_initialization_enabled' ) 
				BEGIN 
					EXEC('   SELECT  SERVERPROPERTY(''ServerName'') AS [Server Name] , 
					RIGHT(@@version, LEN(@@version) - 3 - CHARINDEX('' ON '', @@VERSION)) AS [OS Info] , 
					LEFT(@@VERSION, CHARINDEX(''-'', @@VERSION) - 2)  + '' '' +  CAST(SERVERPROPERTY(''ProductVersion'') AS NVARCHAR(300) ) AS [SQL Server Version], 
					service_account , 
					instant_file_initialization_enabled 
					FROM    sys.dm_server_services 
					WHERE   servicename LIKE ''SQL Server (%''') 
				END 
			ELSE 
				BEGIN 
					SELECT  SERVERPROPERTY('ServerName') AS [Server Name] , 
							RIGHT(@@version, 
								  LEN(@@version) - 3 - CHARINDEX(' ON ', @@VERSION)) AS [OS Info] , 
							LEFT(@@VERSION, CHARINDEX('-', @@VERSION) - 2) + ' ' 
							+ CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(300)) AS [SQL Server Version] , 
							service_account AS [service_account] , 
							'N/A' AS [instant_file_initialization_enabled] 
					FROM    sys.dm_server_services 
					WHERE   servicename LIKE 'SQL Server (%' 
				END   
		END 



		
/*~~~ Check any jobs not owned by SA ~~~*/
	SELECT j.name as [NON-SA-JOB], suser_sname(owner_sid) as [OWNER]
	FROM msdb.dbo.sysjobs AS j
	INNER JOIN msdb.dbo.syscategories AS C ON j.category_id = C.category_id
	WHERE j.owner_sid <> 0x01 and C.name <> 'Report Server';
	GO

    	Declare 
		@Job_OWNERSCMD nVarchar(Max)
		Set @Job_OWNERSCMD = 'Use master;

		-- Jobs with NON-SA-OWNERS
			'
			Select @Job_OWNERSCMD  = @Job_OWNERSCMD + 
			'USE msdb; EXEC msdb.dbo.sp_update_job @job_name = ' + QUOTENAME(j.name) + ', @owner_login_name = ''sa'';' +  CHAR(10)
			FROM msdb.dbo.sysjobs AS j
	        INNER JOIN msdb.dbo.syscategories AS C ON j.category_id = C.category_id
	        WHERE j.owner_sid <> 0x01 and C.name <> 'Report Server';  			
			 
		Print @Job_OWNERSCMD
		--Exec (@Job_OWNERSCMD)
		
	
/*~~~ Number of CPUs ~~~*/
	SELECT cpu_count
	FROM [sys].[dm_os_sys_info]	

/*~~~ Check number of TempDB files ~~~*/
	-- = CPU up to 8
	use tempdb
	SELECT file_id, type_desc, name, physical_name, state_desc
	FROM sys.database_files
	GO
	
/*~~~ Configuration Options ~~~*/
	Select  
		name,  
		value,  
		value_in_use,  
		is_dynamic,  
		is_advanced 
	from  
		sys.configurations  
	where  
		name in ('cost threshold for parallelism', 'backup checksum default', 'max degree of parallelism', 'max server memory (MB)', 'optimize for ad hoc workloads', 'remote admin connections', 'show advanced options', 'backup compression default')  
	order by  
		name; 

/*~~~ Alerts/ Traces ~~~*/ 
	dbcc tracestatus() 
		/* looking for trace 3226 1117, and 1118 */
		--DBCC TRACEON (3226, -1);
		--DBCC TRACEON (1117, -1);
		--DBCC TRACEON (1118, -1);

		-- Needs to be added as a startup parameter in SQL configuration manager (-T3226)

	SELECT [name] 
	FROM [dbo].[sysalerts] 
	where name not like 'Replication%' and (name not Like 'deadlock%') 
	GO 
		/* Looking for errors 16-25 and 823,824, and 825 */
		
/*===================================================== DATABASE CONFIGURATIONS =====================================================*/

/*~~~ Find Current Location of Data and Log File of All the Database ~~~*/
	SELECT @@servername, TYPE_DESC, name, physical_name AS current_file_location, state_desc /*comment out here if the query fails*/, (size*8)/1024 SizeMB, (growth*8)/1024 GrowthSizeMB
	FROM sys.master_files order by name;
		
/*~~~ Check for % autogrowth ~~~*/
	SELECT 
	DB_name(S.database_id) AS [Database Name] 
	,	S.[name] AS [Logical Name]
	,	S.[physical_name] AS [File Name] 
	,	CONVERT(VARCHAR(10),S.growth) +'%' AS [Growth] 
	FROM sys.master_files AS S 
	WHERE S.is_percent_growth = 1;
	

/*~~~ Auto Close, Auto Shrink, Page Verify, Recovery Model, DB owner, Compat Level~~~*/
	SELECT 
		name as [Database Name]
		,	is_auto_close_on as [Auto Close]
		,	is_auto_shrink_on as [Auto Shrink]
		,   page_verify_option_desc as [Should be Checksum]
		,	recovery_model_desc as [Recovery Model]
		,	suser_sname(owner_sid) as [owner name]
		,	target_recovery_time_in_seconds as [Recovery Target (should be 60)]
	from sys.databases
	where state_desc = 'ONLINE'
	order by name
	
/*~~~ Autogrowth changes NEEDED ~~~*/
	/* Dynamic Auto Grow for Data Files
	*/
		
		Declare 
		@auto_growCMD0 nVarchar(Max)
		Set @auto_growCMD0 = 'Use master;
		-- Data Files with % Growth
			
			'
			Select @auto_growCMD0  = @auto_growCMD0 + 
			'ALTER DATABASE ['+DB_NAME(database_id)+'] MODIFY FILE ( NAME = N'''+name+''', FILEGROWTH = 128MB );' + CHAR(10)
			from sys.master_files
			Where type = 0 and state_desc = 'ONLINE' and is_percent_growth = 1  ; 
			SET @auto_growCMD0  = @auto_growCMD0 + '
			
		-- Log Files with % Growth
			'
			Select @auto_growCMD0  = @auto_growCMD0 + 
			'ALTER DATABASE ['+DB_NAME(database_id)+'] MODIFY FILE ( NAME = N'''+name+''', FILEGROWTH = 64MB );' +  CHAR(10)
			from sys.master_files
			Where type = 1 and state_desc = 'ONLINE' and is_percent_growth = 1 ;
		
			 
		Print @auto_growCMD0
		--Exec (@auto_growCMD0)
		
/*~~~~~ Verify Last good CheckDB~~~~~~*/
    SET NOCOUNT ON
    CREATE TABLE #DBInfo_LastKnownGoodCheckDB
        (
            ParentObject varchar(1000) NULL,
            Object varchar(1000) NULL,
            Field varchar(1000) NULL,
            Value varchar(1000) NULL,
            DatabaseName varchar(1000) NULL
        
        )
    DECLARE csrDatabases CURSOR FAST_FORWARD LOCAL FOR
    -- Excludes tempdb and any offline databases
    SELECT name FROM sys.databases WHERE name NOT IN ('tempdb') and state_desc = 'ONLINE'
    OPEN csrDatabases
    DECLARE 
        @DatabaseName varchar(1000),
        @SQL varchar(8000)
    FETCH NEXT FROM csrDatabases INTO @DatabaseName
    WHILE @@FETCH_STATUS = 0
    BEGIN
        --Create dynamic SQL to be inserted into temp table
        SET @SQL = 'DBCC DBINFO (' + CHAR(39) + @DatabaseName + CHAR(39) + ') WITH TABLERESULTS'
        --Insert the results of the DBCC DBINFO command into the temp table
        INSERT INTO #DBInfo_LastKnownGoodCheckDB
        (ParentObject, Object, Field, Value) EXEC(@SQL)
        --Set the database name where it has yet to be set
        UPDATE #DBInfo_LastKnownGoodCheckDB
        SET DatabaseName = @DatabaseName
        WHERE DatabaseName IS NULL
    FETCH NEXT FROM csrDatabases INTO @DatabaseName
    END
    --Get rid of the rows that I don't care about
    DELETE FROM #DBInfo_LastKnownGoodCheckDB
    WHERE Field <> 'dbi_dbccLastKnownGood'
    SELECT 
        DatabaseName, 
        CAST(Value AS datetime) AS LastGoodCheckDB,
        DATEDIFF(dd, CAST(Value AS datetime), GetDate()) AS DaysSinceGoodCheckDB,
        DATEDIFF(hh, CAST(Value AS datetime), GetDate()) AS HoursSinceGoodCheckDB
    FROM #DBInfo_LastKnownGoodCheckDB
    ORDER BY DaysSinceGoodCheckDB DESC

    DROP TABLE #DBInfo_LastKnownGoodCheckDB

/*~~~~~ Verify FUll/DIFF frequency over the past 14 days ~~~~~*/
    SELECT distinct
    b.server_name as 'Source_Server',
    b.database_name
    ,b.is_copy_only,
    b.backup_start_date, --b.backup_start_date as [time],
    b.backup_finish_date, --b.backup_finish_date as [time],
	convert(decimal, (b.backup_size/1024/1024)) as [Size in MB], --Backup Size in MB
	convert(decimal, (b.backup_size/1024/1024/1024)) as [Size in GB], --Backup Size in Gb
    CASE WHEN b.type = 'D' then 'Full'
    WHEN b.type = 'I' then 'Differential'
    ELSE b.type
    END as 'backup_type',
    datediff(Mi, b.backup_start_date,b.backup_finish_date) as 'duration_min',
    b.recovery_model,
    f.physical_device_name,
    b.user_name
    FROM msdb.dbo.backupmediafamily f
    INNER JOIN msdb.dbo.backupset b ON f.media_set_id = b.media_set_id
    WHERE b.type in ('D') and database_name not in ('master','MSDB','Model')
    and (b.backup_start_date >=CONVERT(VARCHAR(10),dateadd(day,-14,getdate()),120))and is_copy_only = 0 -- and b.backup_start_date < CONVERT(VARCHAR(10),GETDATE(),120))
    --and database_name ='DataMart_master'--'DM_1007_NestleUSAllOpcos'
    Order by b.database_name, backup_start_date desc,backup_type

    SELECT distinct
    b.server_name as 'Source_Server',
    b.database_name
    ,b.is_copy_only,
    b.backup_start_date, --b.backup_start_date as [time],
    b.backup_finish_date, --b.backup_finish_date as [time],
	convert(decimal, (b.backup_size/1024/1024)) as [Size in MB], --Backup Size in MB
	convert(decimal, (b.backup_size/1024/1024/1024)) as [Size in GB], --Backup Size in Gb
    CASE WHEN b.type = 'D' then 'Full'
    WHEN b.type = 'I' then 'Differential'
    ELSE b.type
    END as 'backup_type',
    datediff(Mi, b.backup_start_date,b.backup_finish_date) as 'duration_min',
    b.recovery_model,
    f.physical_device_name,
    b.user_name
    FROM msdb.dbo.backupmediafamily f
    INNER JOIN msdb.dbo.backupset b ON f.media_set_id = b.media_set_id
    WHERE b.type in ('I') and database_name not in ('master','MSDB','Model')
    and (b.backup_start_date >=CONVERT(VARCHAR(10),dateadd(day,-14,getdate()),120))and is_copy_only = 0 -- and b.backup_start_date < CONVERT(VARCHAR(10),GETDATE(),120))
    --and database_name ='DataMart_master'--'DM_1007_NestleUSAllOpcos'
    Order by b.database_name, backup_start_date desc,backup_type

/*~~~~~ Verify LOG frequency over the past 1 days ~~~~~*/
    SELECT distinct
    b.server_name as 'Source_Server',
    b.database_name
    ,b.is_copy_only,
    b.backup_start_date, --b.backup_start_date as [time],
    b.backup_finish_date, --b.backup_finish_date as [time],
	convert(decimal, (b.backup_size/1024/1024)) as [Size in MB], --Backup Size in MB
	convert(decimal, (b.backup_size/1024/1024/1024)) as [Size in GB], --Backup Size in Gb
    CASE WHEN b.type = 'D' then 'Full'
    WHEN b.type = 'I' then 'Differential'
    ELSE b.type
    END as 'backup_type',
    datediff(Mi, b.backup_start_date,b.backup_finish_date) as 'duration_min',
    b.recovery_model,
    f.physical_device_name,
    b.user_name
    FROM msdb.dbo.backupmediafamily f
    INNER JOIN msdb.dbo.backupset b ON f.media_set_id = b.media_set_id
    WHERE b.type in ('L') and database_name not in ('master','MSDB','Model')
    and (b.backup_start_date >=CONVERT(VARCHAR(10),dateadd(day,-1,getdate()),120))and is_copy_only = 0 -- and b.backup_start_date < CONVERT(VARCHAR(10),GETDATE(),120))
    --and database_name ='DataMart_master'--'DM_1007_NestleUSAllOpcos'
    Order by b.database_name, backup_start_date desc,backup_type
    

/*~~~~~ Verify FUll backup frequency for system databases over the past 14 days ~~~~~*/
    SELECT distinct
    b.server_name as 'Source_Server',
    b.database_name
    ,b.is_copy_only,
    b.backup_start_date, --b.backup_start_date as [time],
    b.backup_finish_date, --b.backup_finish_date as [time],
	convert(decimal, (b.backup_size/1024/1024)) as [Size in MB], --Backup Size in MB
	convert(decimal, (b.backup_size/1024/1024/1024)) as [Size in GB], --Backup Size in Gb
    CASE WHEN b.type = 'D' then 'Full'
    WHEN b.type = 'I' then 'Differential'
    ELSE b.type
    END as 'backup_type',
    datediff(Mi, b.backup_start_date,b.backup_finish_date) as 'duration_min',
    b.recovery_model,
    f.physical_device_name,
    b.user_name
    FROM msdb.dbo.backupmediafamily f
    INNER JOIN msdb.dbo.backupset b ON f.media_set_id = b.media_set_id
    WHERE b.type in ('D') and database_name in ('master','MSDB','Model')
    and (b.backup_start_date >=CONVERT(VARCHAR(10),dateadd(day,-7,getdate()),120)) and is_copy_only = 0 -- and b.backup_start_date < CONVERT(VARCHAR(10),GETDATE(),120))
    --and database_name ='DataMart_master'--'DM_1007_NestleUSAllOpcos'
    Order by b.database_name, backup_start_date desc,backup_type

Subscribe for Updates

Name

2 thoughts on “A Simple SQL Server Health Checklist”

    • We tend to do lockdowns in other ways per the rest of Andreas’ post – security matters a lot and we’ll be covering that this month as a matter of fact. We love the SA account being locked down or disabled with another account acting as SA – and you can still use it. Thanks for the comment!

      Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This