Partitioning an existing table yearly in SQL Server with huge data

Question

I have a logs table for my application which is almost 3TB data. I want to partition this table yearly. Below is the scripts for Table, View, and indexes script:


GO
CREATE TABLE [dbo].[AuditLogs](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [BankId] [nvarchar](2) NULL,
    [Message] [nvarchar](max) NULL,
    [Level] [nvarchar](100) NULL,
    [TimeStamp] [datetimeoffset](7) NOT NULL,
    [FunctionName] [nvarchar](400) NULL,
    [CorrelationId] [uniqueidentifier] NULL,
 CONSTRAINT [PK_AuditLogs] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
/****** Object:  View [dbo].[View_RequestTime]    Script Date: 22/10/2023 2:56:01 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [dbo].[View_RequestTime]
AS
select [FunctionName],BankId,[TimeStamp],[ElapsedTime] from(
select
  ROW_NUMBER() over(partition by CorrelationId order by l.[TimeStamp])  AS Row#,
    l.[FunctionName],
    l.BankId,
    l.[TimeStamp],
    l.CorrelationId,
   DATEDIFF(millisecond,l.[TimeStamp],lead(l.[TimeStamp]) over(partition by CorrelationId order by l.[TimeStamp])) ElapsedTime
from AuditLogs l  where l.CorrelationId is not null
)lo  where Row#=1

GO
/****** Object:  View [dbo].[View_1]    Script Date: 22/10/2023 2:56:01 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
/****** Object:  Index [NonClusteredIndex-20220821-124100]    Script Date: 22/10/2023 2:56:01 PM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20220821-124100] ON [dbo].[AuditLogs]
(
[Level] ASC,
[TimeStamp] ASC,
[FunctionName] ASC,
[CorrelationId] ASC,
[BankId] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
/****** Object:  Index [NonClusteredIndex-20231011-105720]    Script Date: 22/10/2023 2:56:01 PM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20231011-105720] ON [dbo].[AuditLogs]
(
    [CorrelationId] ASC
)
INCLUDE([BankId],[TimeStamp],[FunctionName]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

My concern is how to create a partition function that works on yearly data partitioning. Column for datetime is [TimeStamp] and I have Id as primary key(clustered index), so how I will create clustered index that works with partition? Any help and suggestion is highly appreciated, I have read this answer before posting this link.

Query Execution Plan: Here is the execution plan link

Below is the query that I ran on the view with very small data compared to actual data. It takes more than 2 minutes to execute.

SELECT [v].[BankId], [v].[FunctionName],
AVG(CAST([v].[ElapsedTime] AS float)) AS [TotalSucceedAvgTime], CAST(COUNT(CASE
    WHEN [v].[ElapsedTime] IS NOT NULL THEN 1
END) AS float) AS [TotalSucceedReq], 
CAST(COUNT(CASE WHEN [v].[ElapsedTime] IS NULL THEN 1
END) AS float) AS [TotalFailedReq], 
(CAST(COUNT(CASE
    WHEN [v].[ElapsedTime] IS NOT NULL THEN 1
END) AS float) / CAST(COUNT() AS float))  100.0E0 AS [SuccessRate], (CAST(COUNT(CASE
    WHEN [v].[ElapsedTime] IS NULL THEN 1
END) AS float) / CAST(COUNT() AS float))  100.0E0 AS [FailedRate]
FROM [View_RequestTime] AS [v]
WHERE (([v].[BankId] = 1)) AND ((CONVERT(date, [v].[TimeStamp]) >= '2022-05-05 00:00:00') AND (CONVERT(date, [v].[TimeStamp]) <= '2024-11-11 00:00:00'))
GROUP BY [v].[FunctionName], [v].[BankId]

AND ((CONVERT(date, [v].[TimeStamp]) >= '2022-05-05 00:00:00') AND (CONVERT(date, [v].[TimeStamp]) <= '2024-11-11 00:00:00')) is just wrong. Don't put the conversion on the column, instead just do AND v.TimeStamp >= '2022-05-05 00:00:00' AND v.TimeStamp < '2024-11-12 00:00:00' in other words a half-open interval — Charlieface, Oct 23 '23 at 14:41
@Charlieface I tried your suggestion but still the same time — Mohd Waseem, Oct 23 '23 at 15:20
I didn't say it would fix all the problems but it's a start. Primarily the issue is probably the ROW_NUMBER because it needs to read all rows to do that calculation first. Try pushing the WHERE clause into the subquery (you need to use a CTE or subquery, not a view). And you need an index (CorrelationId, Timestamp) INCLUDE (FunctionName, BankId) — Charlieface, Oct 23 '23 at 15:39
@Charlieface I didn't get the idea to use CTE, will you please provide a little example based on my query? — Mohd Waseem, Oct 24 '23 at 20:26

score 0 · Answer 1 · answered Oct 24 '23 at 03:08

0

I haven't had a chance to dive in yet, but from a first look, just some quick thoughts:

Your index on ([Level] ASC, [TimeStamp] ASC, [FunctionName] ASC, [CorrelationId] ASC, [BankId] ASC) likely won't be used because Level isn't being used in your query (yet it's the leading column in your index), particularly not in any predicates.

Does the following columnstore index get used and improve the query at all?

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_AuditLogs_CorrelationId_TimeStamp_BankId_FunctionName 
ON dbo.AuditLogs (CorrelationId, [TimeStamp], BankId, FunctionName)

answered Oct 24 '23 at 03:08

J.D.

37,483
8
54
121

I applied the columnstore index but no improvement at all. – Mohd Waseem Oct 24 '23 at 20:21
@MohdWaseem Did you verify if it was used in the execution plan? (It's possible it wasn't.) – J.D. Oct 25 '23 at 01:59

score 0 · Answer 2 · answered Nov 06 '23 at 22:38

I work with similar data. I agree with the others - I'm not sure what your goal is, but just based on your specific ask, what you can do to partition that table is as follows. Given the research you've posted, I'll simply say first you need to have your partition function and partition scheme with FILEGROUPS and FILES ready. Something like these JUST AS EXAMPLES -

USE master;
GO
ALTER DATABASE <yourDB> ADD FILEGROUP [2018];
ALTER DATABASE <yourDB> ADD FILEGROUP [2019];
...
GO
ALTER DATABASE <yourDB>
ADD FILE
    (
        NAME = [2018]
      , FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\2018.ndf'
      , SIZE = 100GB
      , MAXSIZE = 1TB -- Or unlimited, I don't know how big your years are
      , FILEGROWTH = 50GB
    ) TO FILEGROUP [2018];
GO
ALTER DATABASE <yourDB>
ADD FILE
    (
        NAME = [2019]
      , FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\2019.ndf'
      , SIZE = 100GB
      , MAXSIZE = 1TB -- Or unlimited, I don't know how big your years are
      , FILEGROWTH = 50GB
    ) TO FILEGROUP [2019];
...
CREATE PARTITION FUNCTION pf_Year (DATETIMEOFFSET)
AS RANGE RIGHT FOR VALUES
(   N'2019-01-01T00:00:00.000'
  , N'2020-01-01T00:00:00.000'
  , N'2021-01-01T00:00:00.000'
  , N'2022-01-01T00:00:00.000'
  , N'2023-01-01T00:00:00.000'
  , N'2024-01-01T00:00:00.000'
);
GO
CREATE PARTITION SCHEME ps_Year
AS PARTITION pf_Year
TO
(
    [2018]
  , [2019]
  , [2020]
  , [2021]
  , [2022]
  , [2023]
  , [2024]
);
GO

Then to build your table, you need the PARTITION KEY to be apart of the PRIMARY KEY.

CREATE TABLE AuditLogs
(
    Id BIGINT IDENTITY(1, 1) NOT NULL
  , BankId NVARCHAR(2) NULL
  , Message NVARCHAR(MAX) NULL
  , Level NVARCHAR(100) NULL
  , TimeStamp DATETIME NOT NULL
  , FunctionName NVARCHAR(400) NULL
  , CorrelationId UNIQUEIDENTIFIER NULL
  , CONSTRAINT PK_AuditLogs
        PRIMARY KEY CLUSTERED (
                                  Id ASC
                                , TimeStamp ASC
                              )
        WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON
            , ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = ON
             ) ON ps_Year(TIMESTAMP)
) ON ps_year(TimeStamp);
GO

Or alter on the existing table

ALTER TABLE dbo.AuditLogs DROP CONSTRAINT PK_AuditLogs;
GO
ALTER TABLE dbo.AuditLogs
ADD CONSTRAINT PK_AuditLogs
    PRIMARY KEY CLUSTERED (
                              Id
                            , TimeStamp
                          )
    WITH (MAXDOP = 16, ONLINE = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = ON) ON ps_Year(timestamp);
GO

Then just do the same things with the indexes to align them to the partition scheme.

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20220821-124100]
ON dbo.AuditLogs (
                     Level ASC
                   , TimeStamp ASC
                   , FunctionName ASC
                   , CorrelationId ASC
                   , BankId ASC
                 )
WITH (MAXDOP = 16, DROP_EXISTING = ON, ONLINE = ON)
ON ps_Year(TimeStamp);
GO
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20231011-105720]
ON dbo.AuditLogs (CorrelationId ASC)
INCLUDE (
            BankId
          , TimeStamp
          , FunctionName
        )
WITH (MAXDOP = 16, DROP_EXISTING = ON, ONLINE = ON)
ON ps_Year(TimeStamp);
GO

I hope that helps.

Why are you suggesting adding filegroups? That's not required for partitioning. — David Browne - Microsoft, Nov 07 '23 at 01:09
It is true, it is not required. I assume the db in question is on-prem, and I suggest it as it gives on-prem DBA’s more flexibility/control over optimizing storage if IOPS are a bottleneck for certain years but not others. — Caleb Carl, Nov 07 '23 at 03:03
@CalebCarl Most of the time search on log table is based on [Timestamp] (from and To), so ordering in cluster index should be like ([Timestamp],[Id]). — Mohd Waseem, Feb 11 '24 at 13:56

score 0 · Answer 3 · answered Nov 07 '23 at 18:19

Here are my thoughts about huge log type tables. I'd start with changing clustered to [TimeStamp] since vast majority of queries will use it (give me events happened yesterday or last month or between @dfrom and @dto) You can have [Id] as primary key but if this table is never referenced you can redefine your primary key as [TimeStamp],[Id]

How to actually convert your table? First of all confirm your table is 'insert only' (most of log tables do not allow any updates).

Create new table and populate it slowly from source table. On actual switch day you can add missing records and rename tables.

For older partition you can -

put it in read-only filegroup and skip backup/recovery.
you can apply compression
you can create it as columnstore but to do it you have to use partitioning views instead of actual partitions https://www.brentozar.com/archive/2016/09/partitioned-views-guide/

Partitioning an existing table yearly in SQL Server with huge data

3 Answers3