That being said, let’s look at a couple of reasons why transaction logs can grow and what you can do about that in a proactive mode. We’ll break this down into a couple posts.
Through answering forum questions and some recent conversation at a Microsoft SQL Tuning class, it strikes me that there are some misconceptions about how transactions work, when one needs to commit/rollback and what the recovery models actually do to your database. Hopefully we can rake through some of that today.
What is a transaction?
This is probably a review, so let’s just hit the quick points:
- An atomic unit of work that either succeeds or fails together as one.
- Classic example is the credit/debit example (Note this doesn’t work with government budgets, think of your checking account)
- You are online with your bank and initiate a transfer from your checking account to your loan account. This transaction includes a withdrawal and deposit. What happens if there is a failure between operations? Depends on if the work was done in a transaction:
- In a Transaction – The withdrawal that happened is undone (rollback) and everything is fine.
- No Transaction – The withdrawal completes (commit) and the deposit never happens. You are out $300 and have it sitting in limbo land someplace.
There are three modes that SQL Server can operate in with regards to transactions. Implicit and Explicit transactions or in autocommit mode. Let’s explore those:
- Autocommit Mode – This is default behavior in SQL Server. This is also where a lot of people can get tripped up in their thinking (more on that in a minute…). Basically here, if you were to issue a SQL query that modifies data, once that statement completes (without error) the work you did is automatically committed.
- Explicit Transaction – This is also default behavior in SQL Server. If you want to declare a transaction yourself and specify when to commit or rollback a unit of work you use the BEGIN TRANSACTION, ROLLBACK TRANSACTION or COMMIT TRANSACTION commands. You declare your transaction and then through proper error handling manage when to rollback or commit based on your business rules. Through this method you can have multiple statements inside the same transaction as opposed to what happens with autocommit.
- Implicit Transaction Mode – This is off by default but you can enable it within a session with this command: SET IMPLICIT_TRANSACTIONS ON. You can read more about it in books online (here). Basically anytime you issue a command that can modify data, SQL Server will implicitly issue a behind the scenes BEGIN TRANSACTION for you. Every statement issued from that same connection is part of that one transaction and you have to manually issue a rollback/commit command to complete that transaction. Forget to do it and now you have one big transaction which has it’s drawbacks (see below).
My Log is still growing! I didn’t do a BEGIN TRAN?!
So you are doing ETL, doing a large delete operation or otherwise modifying a lot of data at once. You are in simple recovery mode (transaction log is overwritten at certain periods, more on recovery models here) because you don’t need to restore to a point in time. Since you haven’t declared a transaction, why on earth is SQL Server’s transaction log growing out of control?
The answer is simple: because it cares. It doesn’t matter which recovery model you have chosen, SQL Server will always write to the transaction log. In order to maintain consistency in the database, before something is considered as having been completed it needs to be in the transaction log. It will eventually write to the physical disk but it has to be in the transaction log before your transaction is complete (see why it’s so important to put your transaction log on a separate drive and great IO??). If something were being written to the physical .mdf file but not complete when someone pulls the power plug from your server, we have a reasonable expectation of SQL to come back up in a consistent state. This is called recovery. Any transactions that are marked as committed in the log file are essentially rolled forward to the data files. Any that were not done but started are rolled back.
The recovery models just tell SQL how some operations get logged and when that log file is marked for reuse. In FULL recovery mode, it is not reusable until you do your log backup. In Simple recovery mode it is reusable once the data is written to the physical disk and no longer needed in the log, essentially.
The consistency and recovery means something to us when combined with the default mode that SQL operates in:
- Every statement is a transaction whether you tell SQL it is or isn’t. This is a great thing for us and our data quality.
So if you think about that statement, it makes sense that one big update or delete statement would cause your log file to grow “out of control” (it’s actually very much in control, just not your expected results 🙂 ).
Try and think about different methods for handling what you are doing:
- Can you batch your work into smaller units? Issuing several updates,inserts or deletes dealing with a smaller amount of rows each time can help manage log growth. (The log will only be reused only to the point of the longest running transaction, if you have one big query it’s all one transaction).
- When deleting, are you deleting most of the rows? Maybe consider doing a truncate and recreate/repopulate of the table if the rows remaining after the delete will be less than those deleted.
- Consider changing recovery models – Going to Simple or Bulk Logged recovery mode won’t help you if it is all one being done in one operation. It will potentially help the reuse if you batch it out into manageable chunks. If you change your recovery mode, look it up on books online, you will most likely have to perform a full backup to keep your log chain intact and maintain restorability!
- Staying in Full Recovery mode? Perform frequent log backups. Again this won’t help you with the growth of your transaction log during one large operation but it does help with multiple batches.
- Look at using SSIS for ETL work, various options within SSIS can help manage this better and do your loads in a more efficient manner.
While I have your attention –
I mentioned above one good reason to have your transaction log separate. This falls into the “widespread news” category but it’s my experience it isn’t so widely followed. Separate your transaction logs onto a separate drive when you can! Log write performance is incredibly important. If you are going to do RAID at all on it, do RAID 10 or RAID 1 as performance is important (hence no RAID 5) and redundancy is important (hence no RAID 0- striping only).
While we are at it do you put your log backups (are you taking any???) on a separate drive also? If not, what happens if your log file drive crashes and you want to restore a log file backup? Oops.. Separate (Physically, not partitions on the same drive) your usage of drives. Yes budgets come into play but there are some basics you should try and follow where possible…