SQL Server is a complicated piece of software, so it’s understandable that some of us have some misconceptions about what the transaction log is and what it does. This post is designed to not just tell you about these myths, but also to show you a few ways how to prove (or rather disprove) some of the more stubborn beliefs some of us hold on to.
Oh, and please don’t feel bad if you believe any of these myths now. Just please stop believing them after you read the post. Seriously, it’s never too late to learn, or in this case, un-learn myths you may believe. OK, here comes the truth.
Myth #1: Inserting or updating 10 MB of data will also use 10 MB of log space
As convenient as it sounds, there’s no easy correlation between data pages and log usage. In reality, the size of the log space used depends largely on all sorts of things, like if you are working with a single transaction or a batch of statements, if your table has indexes or dependencies, how many locks get taken, growth events that occur during your transaction, and more.
In my experience, this myth often grossly underestimates how much log space is actually required for a large update, which is why might want to consider batching your data manipulation and, if your database is in Full recovery model, making more frequent backups to clear the log.
Here’s a script for you to disprove this myth, which will create a “Myths” database and show you exactly how much data and log space is used before and after an insert of a million rows occurs.
CREATE DATABASE Myths; GO ALTER DATABASE Myths SET RECOVERY FULL; GO SET NOCOUNT ON; GO USE [Myths] GO SELECT RTRIM(name) AS [FileName] , CAST(size/128.0 AS DECIMAL(10,2)) AS FileSizeInMB , CAST(FILEPROPERTY(name, 'SpaceUsed')/128.0 AS DECIMAL(10,2)) AS SpaceUsedInMB FROM sysfiles ORDER BY groupid DESC GO -- create a base table CREATE TABLE OneMillionDates (Id int PRIMARY KEY CLUSTERED, UpdateDate datetime2); GO -- populate that table with 1,000,000 rows DECLARE @id int =1 BEGIN TRANSACTION; WHILE @id <= 1000000 BEGIN INSERT OneMillionDates (Id, UpdateDate) SELECT @id, getdate() SET @id += 1 END; COMMIT; GO SELECT RTRIM(name) AS [FileName] , CAST(size/128.0 AS DECIMAL(10,2)) AS FileSizeInMB , CAST(FILEPROPERTY(name, 'SpaceUsed')/128.0 AS DECIMAL(10,2)) AS SpaceUsedInMB FROM sysfiles ORDER BY groupid DESC GO
Your mileage may vary, but when I run this, I see that after the transaction, the data space used grows by about 20 MB, while the transaction log space used grows by about 120 MB. Not exactly a 1-to-1 ratio, is it?
Myth #2: TRUNCATE TABLE is not logged
I’m not one of those mean folks who ask a lot of “pop quiz” type questions in job interviews, but I will ask about this one since if you know the answer, it shows you’ve dug a little deeper into SQL Server. If you get it wrong, well, I’ve actually found lots of interviewees think TRUNCATE TABLE isn’t a logged operation.
Interestingly, this is one of those things that takes about 60 seconds to prove as a myth, so let’s use that table we created in disproving the last myth to show that TRUNCATE TABLE is, in fact a logged transaction. As you can see in the script, the table is truncated in a transaction and returns 0 rows…until the transaction is rolled back, and the table still has 1,000,000 rows again.
BEGIN TRANSACTION; TRUNCATE TABLE OneMillionDates SELECT COUNT(*) AS BeforeRollback FROM OneMillionDates; ROLLBACK; SELECT COUNT(*) AS AfterRollback FROM OneMillionDates;
This is only possible because TRUNCATE TABLE is logged in SQL Server. The thing is, it’s so fast because it is logged differently than if you removed all rows with a DELETE. DELETE will have representation in the log for every row deleted (and if enough rows are deleted, a large amount of log growth!) TRUNCATE TABLE instead notes in the transaction log the deallocation of the pages in the table, which takes a lot less time and space in the log.
Myth #3: DROP TABLE is not logged
I don’t know where these “is not logged” myths originate. Perhaps it’s because some commands execute so quickly. Regardless, this is another one I’ve heard folks say, and it’s just as easy to disprove as the last myth. Let’s do the same thing with a transaction as we did previously.
First, let’s drop the table so it’s gone.
BEGIN TRANSACTION; DROP TABLE OneMillionDates SELECT COUNT(*) AS BeforeRollback FROM OneMillionDates; Running this should give you an “invalid object name” error since the table is gone. Well, it’s gone until we roll back the transaction and the table magically appears again. ROLLBACK; SELECT COUNT(*) AS AfterRollback FROM OneMillionDates;
Like the last myth, DROP TABLE is clearly a logged operation. Although there is at least one operation that isn’t logged that some folks think is.
Myth #4: If the log file is out of space, the database can’t be switched from Full to Simple
If you believe this, it probably means you are old. I say that because this myth was, in fact true in earlier versions of SQL Server. And by earlier, I mean SQL Server 2005 and before.
I don’t have a test for this since I don’t want you or me to run out of drive space. Just trust me when I say that if you need to, you can switch the recovery model – and its model, not mode – whenever you want in any version since SQL Server 2008, as this is no longer a logged operation.
Now, just because you can switch recovery models doesn’t mean you should, since switching from Full to Simple breaks your transaction log backup chain and your ability to have point-in-time recovery. If you think changing the recovery model is your best course of action, be sure to run a full backup after your recovery model shenanigans are completed since you’re now one disaster away from a possible resume-generating event.
And speaking of things to do when your log file runs out of drive space, that brings me to another myth.
Myth #5: Additional log files improve performance
Maybe this one got started because folks believe log files can be used in parallel, like data files. Regardless, additional log files certainly won’t help database performance. They likely won’t hurt either, but having more than one log file is completely unnecessary since SQL Server will move sequentially through a log file and never use more than one at a time.
The only reason to add a second log file is if the drive containing the first log file runs out of space, in which case you will need to add a second log file on another drive to allow current transactions to keep executing. Even still, once you have that transaction completed and can resize the original log file or add space to the drive where it resides, you should remove the second log file as soon as you can. Leaving extra log files is just plain messy, and if you don’t remove them, someone else who has read this post will see it and silently judge you.
Alas, I don’t have any test for this either since it would involve running two transactions that may or may not run at the same speed, depending on what else was happening on the server. Hopefully, I’ve earned your trust by now. If not, I’ve got some more myths for you.
Myth #6: Full backups include log backups
So, this one is a bit complicated. Yes, full backups can include part of the transaction log, especially the first time you run them, but:
- If you have a backup chain started with log backups, full backups won’t back up log transactions and break the log backup chain.
- Full backups don’t ever allow for point-in-time recovery.
- Full backups don’t ever mark the log for reuse.
That first bullet point is the main one. I say this because I see lots of clients who run full backups and never back up their transaction log, even though their databases are in Full recovery model. I also say this because I still hear folks saying full backups will break the log backup chain. Please try the example below and stop saying that.
Let me show you what I mean. To disprove this myth, we will use fn_dblog, which is a built-in function that allows us to read the transaction log.
First, let’s start a backup chain with full and log backups to our Myths database, which should already be in Full recovery model. We will run all these backups to NUL, which writes our backups to nowhere, so we don’t need any additional drive space.
BACKUP DATABASE Myths TO DISK = N'NUL'; GO BACKUP LOG Myths TO DISK = N'NUL'; GO Next, we’ll run a transaction to generate some log activity. This may look slightly familiar. TRUNCATE TABLE OneMillionDates; GO DECLARE @id int =1 BEGIN TRANSACTION; WHILE @id <= 1000000 BEGIN INSERT OneMillionDates (Id, UpdateDate) SELECT @id, getdate() SET @id += 1 END; COMMIT; GO
Now, let’s count how many log rows we have, then run a full backup to NUL, then again count the number of log rows after the backup.
SELECT COUNT([Current LSN]) as 'LogRows' FROM fn_dblog(null,null); GO BACKUP DATABASE Myths TO DISK = N'NUL'; GO SELECT COUNT([Current LSN]) as 'LogRows' FROM fn_dblog(null,null); GO
You may notice you now have more, not less, log rows. If that doesn’t prove full backups don’t backup the log and break the log chain, I don’t know what more to tell you.
Moving on, let’s do the same thing, but with a log backup instead of a full backup.
SELECT COUNT([Current LSN]) as 'LogRows' FROM fn_dblog(null,null); GO BACKUP LOG Myths TO DISK = N'NUL'; GO SELECT COUNT([Current LSN]) as 'LogRows' FROM fn_dblog(null,null); GO
Now we can see that nearly all the rows in the log have been backed up by the log backup, just as log backups are supposed to do.
While we’re using fn_dblog, let’s look at one final myth.
Myth #7: SELECT statements don’t use the transaction log
You may be thinking, “Why would a SELECT statement involve the transaction log?” That’s a fair question, and the answer has to do with what exactly a SELECT query does at the server level. Your query doesn’t exist in a vacuum, but instead, queries need to be able to execute while other SQL statements are also occurring in your database. SQL Server has mechanisms to manage the data integrity while multiple queries are executing, like locks and latches and transaction COMMITs (implicit and explicit), and the use of those mechanisms is also logged.
If you’ve ever had a query blocked by a SELECT, then you know what I’m talking about. If not, let’s use fn_dblog to prove it.
SELECT COUNT([Current LSN]) as 'LogRows' FROM fn_dblog(null,null); GO SELECT TOP 1 UpdateDate FROM OneMillionDates ORDER BY UpdateDate DESC; GO SELECT COUNT([Current LSN]) as 'LogRows' FROM fn_dblog(null,null); GO
See how the number of log rows increased? And all we did was SELECT. You can dig deeper into the transaction log with fn_dblog if you want, but that’s outside of the scope of this post, and quite frankly, I’ve already taken enough of your time.
If you’ve ever repeated any of these myths to someone else, please don’t feel embarrassed. Honestly, at one time, even I thought a few of these myths were true. The important thing is now you know the truth, and with this knowledge, you’ll be better at using SQL Server and the transaction log.