Adam Machanic, who blogs at SQLBlog.com (lots of great sql bloggers over there) started a neat tradition: T-SQL Tuesday. Read his post here to see the rules and information but it’s basically a weekly (2nd Tuesday of each month) blog topic where we can all have a post about the same topic. I like this because it’s a great way to go in depth on a topic of interest to the community at large. I am not sure if the focus needs to remain on T-SQL or if we can bring up DBA topics, etc but I’ll find out if I get a chance to host the “event”.
I also really Like the numbering convention Adam used (“00n”), it shows he really wants to make this continue for awhile.
Dates and Times 101
This month’s topic is about dates and times. From Adam’s original post: “the topic for this month is Date/Time Tricks. Write a blog post that talks about dates and times–this can be based around T-SQL programming, data modeling, ETL, reporting, or whatever else you’re using dates and times for (and who isn’t?)”
Check back at his initial post to see all of the responses that come in on Tuesday. I will be visiting and expecting to learn a lot. I wanted to keep my response simple here so some of the true T-SQL gurus can have more of your time (and my time) when reading the posts.
So I am going to cover a couple basic points that have been on my mind for some reason or another recently…
We Are The World
Working for a global company with developers on three continents writing procedures and processes for users on at least 5 continents, dates and times can sometimes be confused. A lot of our databases are centrally hosted here in the states and tend to be in the default collation. So we don’t have a lot of language/special character issues but there are times when Dates can be confused.
Today is 8/12/2009. No, Today is 12/8/2009.
Who is right? Well really both answers are right depending on where you are and where you are from. To someone in the United States the second answer seems more right (Unless you are in the military and when using your pen, ball-point, black; you may be used to writing a date on a report as 08December2009). To someone in a lot of Europe the first may seem more right.
Keep this in mind when querying with dates.
The following date format will always be interpreted correctly by SQL Server regardless of your regional settings and collation. It is the ISO format for dates: YYYYMMDD. Today, by that standard, is 20091208. When looking at the Cast/Convert books online topic and the chart, you’ll see this is format 112.
Format 126 (ISO 8601) is also recognized globally and it includes time. Check out that books online article and get to know and love these date formats, especially if you are working with folks across the globe.
You Don’t Have to Cluster the Primary Key!
This is something that should be common knowledge but through interviewing folks, working with developers and seeing newsgroup/forum questions, it still isn’t. The default behavior in SQL Server is to make all Primary Keys the clustered index. There is some merit there but it isn’t a requirement.
Sometimes a date column, even if the date is not very unique (perhaps not tracking times) or even in some cases where the “date” is really just an integer of YYYYMM. Can be a good clustered index in my experience.
Where? Mostly in data warehousing situations in my experience. Especially if folks are typically querying for a range of time, for transactions on a particular day or wanting to sort results by date. I find that in Fact tables a clustered index on something to do with the date can be helpful.
Why? How often do you query a Fact Table by a surrogate key? Yes some could argue well indexed dimensions and non clustered indexes on all the dimension keys in the fact table (perhaps on some commonly used measures) is enough. I find that the dates help for the reasons describe above on querying the data. It also helps when loading the warehouse if the dates come in in date order. You still have much of the benefit of an ever increasing clustered key reducing insert overhead from page splits, and having to look all over. Yeah there will likely be some dupes, so there is a 4byte uniqueifier overhead in the clustered index. I would also imagine that when partitioning by date this clustered index strategy would make life a bit easier for you.
Of course, you should test in your environment and use what works for you. Empirical Evidence speaks volumes when trying scenarios out. Experiment with your indexes and see what works best for your work load, your insert behavior, your query behavior, etc.
Off To Read The Other Posts
Ok well I’m not really off to read them at this moment. I am writing this post on 20091204 and will, on 20091208, go and read other posts to learn a lot about dates and times. Check the original post for all the trackbacks of interesting date/time information.