2

I have a list of records with datetimes over more than 1 year. My goal is to compare the count of today's records with the count of records from last year, where the week # and day of the week from last year's search match today's week # and day of the week.

For example: if today is 07-26-2016, then the week number is 31, and the day is Tuesday (3). Then I would be searching for week number 31, Tuesday (3) of 2015. This day is 07-28-2015.

My problem is that, if I use a WHERE clause with the date in form '07-28-2015', the query completes in ~30 seconds. But if I use a WHERE clause with DATEPARTs for year, week, and day of week, my query takes > 10 minutes.

Here are the two queries to compare:

first, with the full date in the WHERE clause. This takes 30 seconds

SELECT COUNT(*)
FROM my_table
WHERE CAST(my_datetime AS DATE) = '2015-07-28'

now, with the DATEPARTs. This takes 10+ minutes

SELECT COUNT(*)
FROM my_table
WHERE DATEPART(yy,CAST(my_datetime AS DATE)) = 2015
  AND DATEPART(wk,CAST(my_datetime AS DATE)) = 31
  AND DATEPART(dw,CAST(my_datetime AS DATE)) = 3

I have also tried the second query without casting my_datetime as DATE, and it is just as slow. They DO work, and I have compared the results from the two, but it is necessary that I use the second query so that I can replace the static values (2015, 31, 3) with

DATEPART(yy,GETDATE())-1 / DATEPART(yy,CAST(GETDATE() AS DATE))-1
DATEPART(wk,GETDATE()) / DATEPART(wk,CAST(GETDATE() AS DATE))
DATEPART(dw,GETDATE()) / DATEPART(dw,CAST(GETDATE() AS DATE))

That way, the query will always give me results for last year's corresponding date.

How do I optimize this query to be faster? Or am I approaching this incorrectly?

ac927
  • 23
  • 1
  • 3
  • Anytime you cast a column in the where clause you lose the ability to use the index. If the column is datetime, use a where clause like this: where my_datetime >= '7/15/2015' and my_datetime < '7/16/2015'. If that column is not indexed, then add one. Query should return rather fast. – HunterX3 Jul 26 '16 at 18:53
  • What about if you try WHERE my_datetime >= '2015-07-28' AND my_datetime < '2015-07-29'? With an index on (my_datetime) it should be as fast as can be. Probably a few milliseconds. – ypercubeᵀᴹ Jul 26 '16 at 18:55
  • In your version, even the WHERE CAST(my_datetime AS DATE) = '2015-07-28' should be fast and nowhere near 30 seconds. Unless your table is really huge or you have no indexes. – ypercubeᵀᴹ Jul 26 '16 at 19:00
  • @ypercubeᵀᴹ this is what I have tried. It works very quickly, and it is the correct value, but it does not solve my issue of having to figure out WHAT date I am searching for... edit: the query with >= and < only takes a few seconds, for clarification. – ac927 Jul 26 '16 at 19:00
  • @ypercubeᵀᴹ and yes, the table is huge – ac927 Jul 26 '16 at 19:01
  • Is there an index on the datetime column though? And what is huge for you? I meant a few billion rows at least. – ypercubeᵀᴹ Jul 26 '16 at 19:05
  • @ypercubeᵀᴹ yes, there is an index on the datetime column. and no, not that big. more like 1 billion – ac927 Jul 26 '16 at 19:07
  • I would do something like this: Declaire dateTime as datetime, oldDateTime as datetime set dateTime = '7/15/2016' set oldDateTime = dateadd(yy,-1,dateTime) select currentRowCount = count() from my_table where my_datetime >= dateTime and my_datetime < dateadd(dd,1,dateTime) select lastYearRowCount =count() from my_table where my_datetime >= oldDateTime and my_datetime < dateadd(dd,1,oldDateTime) – HunterX3 Jul 26 '16 at 19:09
  • @HunterX3 I believe this gives me the count from exactly one year ago, whereas I am looking for the day last year with the same week # and day of week as today – ac927 Jul 26 '16 at 19:23

3 Answers3

3

Why not try using a calendar table? Using the code to create one (took about 1 second) from here.

SELECT COUNT(1)
FROM My_Table
JOIN #Dim
    ON My_Table.my_datetime = #Dim.date
WHERE DATEPART(year,CAST(getdate() AS DATE))-1 = [year]
  AND DATEPART(week,CAST(getdate() AS DATE)) = [week]
  AND DATEPART(weekday,CAST(getdate() AS DATE)) = [DayOfWeek]

In general (as I'm sure has been mentioned in the comments by now) any time you use functions on a column in the WHERE clause you are going to take a performance hit. Frequently a very large one. Among other things those columns are not SARGable (search argument, ie can use an index).

Kenneth Fisher
  • 24,127
  • 12
  • 61
  • 114
  • CAST(datetime_column AS DATE) can be sargable, depending on SQL Server version. But I agree with everything in this answer, there is no need to write that code than depends on that. See Cast to date is sargable but is it a good idea? – ypercubeᵀᴹ Jul 26 '16 at 19:49
  • Shouldn't the ON My_Table.my_datetime = #Dim.date be amended to care for the fact that one column is DATETIME and the other is DATE? – ypercubeᵀᴹ Jul 26 '16 at 19:54
  • @ypercubeᵀᴹ Probably true. Best way to do that would be to either change the source table (if they don't need time) or better yet the calendar table. Maybe with a begindate and enddate columns and "my_datetime >=begindate and my_datetime < enddate" in the where clause. – Kenneth Fisher Jul 26 '16 at 20:07
  • I meant something like ON My_Table.my_datetime >= #Dim.date AND My_Table.my_datetime < DATEADD(day, 1, #Dim.date) for their current setup. Of course stripping the time from the column is a nice suggestion as well, if they can go with it. Or having an index on a computed column with date only. – ypercubeᵀᴹ Jul 26 '16 at 20:10
  • Unfortunately don't have the option of creating calendar table. I did find this recommendation many other times though, so probably would have been the way to go – ac927 Jul 27 '16 at 13:15
3

In short, you want to use this type of condition for the query to be efficient:

SELECT COUNT(*)
FROM my_table
WHERE my_datetime >= '2015-07-28' 
  AND my_datetime < '2015-07-29' ;

Now the only problem is to find the date values of the "same" day, last year. Since the weekday and week of a day of the year is a rather complicated calculation, I would try basing the calculations on the week and weekday of January 1st:

First, to find January 1st of previous year ('20010101' is an arbitrary date):

SET @prev_year_Jan_01 = DATEADD(year, 
                                DATEDIFF(year, '20010101', GETDATE()) - 1, 
                                '20010101') ;

Then we can use the week and weekday parts of the wanted date, to find the "same" day last year:

SET @week_1 = 1 ;    -- Jan 01 is always week 1
SET @weekday_1 = DATEPART(weekday, @prev_year_Jan_01) ;

SET @weeks = DATEPART(week, GETDATE()) - @week_1 ;   -- weeks pasted since Jan 01
SET @weekdays = DATEPART(weekday, GETDATE()) - @weekday_1 ;

and finally:

SET @last_year_this_day = DATEADD(day, @weekdays,     -- add day diff
                            DATEADD(week, @weeks,     -- and week diff
                              @last_year_Jan_01)) ; 

Then we can use:

SELECT COUNT(*)
FROM my_table
WHERE my_datetime >= @last_year_this_day 
  AND my_datetime < DATEADD(day, 1, @last_year_this_day) ;
ypercubeᵀᴹ
  • 97,895
  • 13
  • 214
  • 305
  • 1
    Exactly. There are probably a dozen different ways to get there but ultimately what you need is a query that does a DATE >= x and DATE < x +1 to get everything on that day. Any explicit or even implicate conversion of the data in the table effectively eliminates the ability to use any indexes and will tank your performance very quickly.

    I learned that one the hard way. It's amazing how bad performance is and how hard you hit the database server when basically everything is a full table scan.

    – Evan Steinbrenner Jul 26 '16 at 20:32
  • This answer would be more complete with an explanation of why wrapping columns in functions breaks SARGability, why one wants SARGability, and how your answer achieves it ;-). – binki Feb 20 '18 at 17:02
0

I like Kenneth Fisher's response.

If you don't have the option of creating the calendar table, I think you should looks for the exact date first, set that date as a variable, then use the variable in your WHERE clause.

Building the date takes less than a second, then you have a direct filter.

Took a little while to get the calculations right, but I think this works.

DECLARE @DateToSearch DATETIME;
SET @DateToSearch =  CAST('01/01/' + CAST((YEAR(GETDATE())-1) AS VARCHAR) AS DATETIME); --Set Variable to 01/01/LastYear
SET @DateToSearch = DATEADD(WEEK, DATEPART(WEEK,GETDATE())-1, @DateToSearch); --Set Variable to 01/01/Last year + Weeks
SET @DateToSearch = DATEADD(WEEK,DATEDIFF(WEEK,0,@DateToSearch),0); --Set Variable to Floor of week
SET @DateToSearch = DATEADD(DAY, DATEPART(dw,GETDATE())-2, @DateToSearch); --Add Current Day of Week
SELECT @DateToSearch;

SELECT COUNT(*) 
FROM my_table
WHERE CAST(my_datetime AS DATE) = @DateToSearch

Today's date "match" for last year returns: 7/28/2015.

SQLDevDBA
  • 2,254
  • 9
  • 12
  • I like this idea but depending on the way the data is structured it may not work properly. Since it is in the datetime format it will include the time in the comparison so unless the time portion is 00:00.000, like it would be in your built up datetime, for all the entries it wouldn't find everything or possibly anything on that day.

    You can do a simple >= x <x +1 to get everything on a day

     SELECT COUNT(*) 
     FROM my_table
     WHERE my_datetime >= @DateToSearch and my_datetime < DATEADD(DAY,1,@DateToSearch)
    
    – Evan Steinbrenner Jul 26 '16 at 19:51
  • 1
    I see what you are saying, but CAST (my_datetime AS DATE) would make it equivalent to 00:00:00.000. Unfortunately DATE isn't available until 2008R2 I believe, so that can be an issue. – SQLDevDBA Jul 26 '16 at 19:55
  • That would be another way to do it but casting as a DATE from DATETIME would likely have a very negative impact on performance because you loose the ability to use an Index for the query and have to do a table scan on what is apparently a large to very large table. – Evan Steinbrenner Jul 26 '16 at 20:24
  • 1
    This appears to return the value we want, but only when you do not subtract fifteen from each statement. Why subtract fifteen in the first place? – ac927 Jul 26 '16 at 20:31
  • Apologies @ac927. That was just me testing that it worked with days that were not today. Just full-proofing it. Thanks for catching that. – SQLDevDBA Jul 26 '16 at 21:00
  • @Evan Steinbrenner: I don't disagree with the performance degradation issue. You're right about that, and your solution to it is one I'd probably implement as well. ac927 stated that his query took 30 seconds when using CAST AS DATE and it didn't seem to bother them, so I left it intact. – SQLDevDBA Jul 26 '16 at 21:13
  • I think it is dangerous to rely on locale-specific date formats. Instead of something like '01/01/' + @year, please use something like DATEFROMPARTS instead. Even if it is somehow unambiguous because you’re always putting 4-digit years into it and it works in this case, suggesting this string-concatenation method of dealing with dates will result in people copying you and using it in cases where it isn’t necessarily safe. – binki Feb 20 '18 at 16:57
  • Expressing that it's dangerous to rely on default parameters, and then suggesting the use of a function that is not supported in the version of SQL Server that was tagged in the original question isn't necessarily the best approach. Please feel free to add a new answer. – SQLDevDBA Feb 20 '18 at 18:21