1

I have a table with a column for ID, Date and Frequency of occurrence for that particular date and ID.

ID    date          Frequency
1     2014-05-18      5
1     2014-05-19      4
1     2014-05-20      25
2     2014-05-20      7
3     2014-05-18      4
3     2014-05-20      1
4     2014-05-18      6

I'm looking to insert the dates for each ID that aren't currently there (assuming that the date ranges from 2014-05-18 to 2014-05-20) and add a value of 0 in Frequency for those.

For example, you can see that

ID 1 has an observation for all three dates;

ID 2 is missing 2014-05-18 and 2014-05-19;

ID 3 is missing 2014-05-19;

ID 4 is missing 2014-05-19 and 2014-05-20.

Therefore, the output should look like this

ID    date          Frequency
1     2014-05-18      5
1     2014-05-19      4
1     2014-05-20      25
2     2014-05-18      0
2     2014-05-19      0
2     2014-05-20      7
3     2014-05-18      4
3     2014-05-19      0
3     2014-05-20      1
4     2014-05-18      6
4     2014-05-19      0
4     2014-05-20      0

I've attempted the following from this question. However, that will only add dates with 0 Frequency that aren't present throughout the entire table instead of on an ID level.

DECLARE @s DATE = '20140518', @e DATE = '20140520';

SELECT c.date, Frequency = COALESCE(a.Frequency,0) FROM dbo.Calendar AS c LEFT OUTER JOIN dbo.table AS a ON c.date = s.[date] WHERE c.d >= @s AND c.d < DATEADD(DAY, 1, @e);

usr
  • 187
  • 2
  • 7

2 Answers2

3

Using your base sample query above, this code should do what you are looking for. The important changes:

  • CROSS JOIN to a list of IDs that should be in the report
  • JOIN to the data table using Date and ID
  • SELECT the id from the CROSS JOIN table

The code:

DECLARE @s DATE = '20140518', @e DATE = '20140520';

SELECT 
    i.ID, 
    c.[date], 
    Frequency = COALESCE(a.Frequency,0)
FROM 
    dbo.Calendar AS c
CROSS JOIN (SELECT DISTINCT x.ID FROM dbo.table x WHERE x.date >= @s AND x.date < DATEADD(day, 1, @e)) AS i
LEFT OUTER JOIN 
    dbo.table AS a
ON 
    c.[date] = a.[date]
    AND a.ID = i.ID
WHERE 
    c.[date]>= @s
    AND c.[date] < DATEADD(DAY, 1, @e);
usr
  • 187
  • 2
  • 7
Laughing Vergil
  • 2,058
  • 10
  • 21
2

Assuming this table/data:

CREATE TABLE dbo.MyTable
(
  ID int NOT NULL, 
  [date] date NOT NULL, 
  Frequency int,
  CONSTRAINT PK_MyTable PRIMARY KEY (ID,[date])
);

INSERT dbo.MyTable VALUES
(1,'2014-05-18',5 ),
(1,'2014-05-19',4 ),
(1,'2014-05-20',25),
(2,'2014-05-20',7 ),
(3,'2014-05-18',4 ),
(3,'2014-05-20',1 ),
(4,'2014-05-18',6 );

You need to generate the list of dates in the range, then produce a row for each date combined with each unique ID that exists within your range, then left outer join that output to your actual data.

DECLARE @s date = '20140518', @e date = '20140520';

;WITH n AS
(
  SELECT n = 0
  UNION ALL
  SELECT n + 1 FROM n
  WHERE n < DATEDIFF(DAY, @s, @e)
),
range AS
(
  SELECT [date] = DATEADD(DAY, n, @s) FROM n
),
IDs AS
(
  SELECT ID 
    FROM dbo.MyTable AS t
    INNER JOIN range
    ON t.[date] = range.[date]
    GROUP BY ID
)
SELECT 
  IDs.ID,
  range.[date],
  Frequency = COALESCE(t.Frequency, 0)
FROM range CROSS JOIN IDs
LEFT OUTER JOIN dbo.MyTable AS t
ON t.[date] = range.date 
AND IDs.ID = t.ID;

A potentially slightly more efficient alternative:

;WITH n AS
(
  SELECT n = 0 UNION ALL SELECT n + 1 FROM n
  WHERE n < DATEDIFF(DAY, @s, @e)
),
range AS
(
  SELECT [date] = DATEADD(DAY, n, @s) FROM n
),
src AS
(
  SELECT ID, [date], Frequency 
    FROM dbo.MyTable AS t
    WHERE EXISTS (SELECT 1 FROM range WHERE range.[date] = t.[date])
)
SELECT 
  IDs.ID,
  range.[date],
  Frequency = COALESCE(t.Frequency, 0)
FROM range
CROSS JOIN (SELECT ID FROM src GROUP BY ID) IDs
LEFT OUTER JOIN dbo.MyTable AS t
ON t.[date] = range.date 
AND IDs.ID = t.ID;

Output:

ID  date        Frequency
--  ----------  ---------
1   2014-05-18  5
1   2014-05-19  4
1   2014-05-20  25
2   2014-05-18  0
2   2014-05-19  0
2   2014-05-20  7
3   2014-05-18  4
3   2014-05-19  0
3   2014-05-20  1
4   2014-05-18  6
4   2014-05-19  0
4   2014-05-20  0

One issue with a recursive CTE is that if you need to support a date range > 100 days you need to add OPTION (MAXRECURSION n).

Aaron Bertrand
  • 180,303
  • 28
  • 400
  • 614