We have a very large staging table (> 80 GB). From our source system we load invoice data in the staging table. From the staging we transform the data and load into DWH/Fact. Every day we delete the current month, then reload from source into stage. The stage contains complete history over time.
In some DW loads we only need the current month. Sometimes year and previous year.
What is a better index strategy:
- Clustered index on a
datecolumn (Fiscal Period) - Primary key with
IDENTITYas surrogate key - Clustered index for natural key (some kind of line item e.g. invoice number)
All queries contain the date column (Fiscal Period) and sometimes additional columns like Invoice type as non-clustered index. In the ETL we can disable the non-clustered index but not the clustered index.
Which of the three types has the best performance for:
- Insert into Stage table
- Query the Stage table