3

I have a table with roughly 11 million rows, defined as:

CREATE TABLE [sko].[stage_närvaro]( 
    [datum_fakta] [datetime] NULL, 
    [person_id] nvarchar NULL, 
    [läsår_fakta] nvarchar NULL, 
    [termin_fakta] nvarchar NULL, 
    [period_fakta] nvarchar NULL, 
    [vecka_fakta] nvarchar NULL, 
    [veckodag_fakta] nvarchar NULL, 
    [ämne_id] nvarchar NULL, 
    [ämne] nvarchar NULL, 
    [frånvaro_min] [float] NULL,
    [närvaro_min] [float] NULL, 
    [frånvaroorsak_id] nvarchar NULL,
    [frånvaroorsak] nvarchar NULL, 
    [beskrivning] nvarchar NULL, 
    [personal_id] nvarchar NULL, 
    [försystem] nvarchar NULL 
)

With the following non-clustered index:

CREATE NONCLUSTERED INDEX [stage_skola_närvaro_ix1] ON [sko].[stage_närvaro] 
(
    [person_id] ASC,
    [termin_fakta] ASC,
    [läsår_fakta] ASC
)

When I run the following delete query, it takes atleast 2+ hours to complete.

DELETE  sko.stage_närvaro
FROM    sko.stage_närvaro e
WHERE   försystem = 'Extens'
AND EXISTS (
    SELECT  *
    FROM    ext.v_imp_närvaro v
    WHERE   e.person_id = v.person_id
    AND     e.termin_fakta = v.termin_fakta
    AND     e.läsår_fakta = v.läsår_fakta
)

Is my delete-query using my index? Would it help to disable the index before deleting, and enable it afterwards?

Edit: The view ext.v_imp_närvaro has the same amount of rows as the table sko.stage_närvaro.

Edit2: I suspected that it was an I/O issue, so I ran the following query as suggested by DaniSQL here:

SELECT TOP 10
        wait_type ,
        max_wait_time_ms wait_time_ms ,
        signal_wait_time_ms ,
        wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
        100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( ) AS percent_total_waits ,
        100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( ) AS percent_total_signal_waits ,
        100.0 * ( wait_time_ms - signal_wait_time_ms )
        / SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM    sys.dm_os_wait_stats
WHERE   wait_time_ms > 0 -- remove zero wait_time
        AND wait_type NOT IN -- filter out additional irrelevant waits
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH', 'SQLTRACE_BUFFER_FLUSH',
  'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT', 'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK',
  'SLEEP_BPOOL_FLUSH', 'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT',
  'FT_IFTSHC_MUTEX', 'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
  'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
  'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
  'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
  'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
  'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS', 'WAITFOR',
  'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN', 'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DESC

With the following result. I'm not sure how to interpret them though.

result

Marcus
  • 135
  • 6
  • If you want to understand whether the index is used or not, turn on the actual execution plan when you run the statement. This shows you the exact info and may help finding the bottleneck here. To turn on the actual plan, please select the symbol in the management studio it's in the panel. next to your results and messages tab, the plan is shown when the query finished. you won't get the actual plan without running the query completely :/ – RayofCommand Jul 18 '16 at 09:34
  • What is the definition of v_imp_närvaro and what are the indexes on those tables? – James Z Jul 18 '16 at 11:03
  • Your index is most likely totally useless for the delete, of course it must be updated, but index for försystem might help, if 'Extens' isn't a common value. Indexes that could be used for v_imp_närvaro are more important. – James Z Jul 18 '16 at 11:05
  • 1
    Without the definition of the view v_imp_närvaro and the tables (& their indexes) it uses we can't really tell, but it could be table scanning v_imp_närvaro for rows matching försystem = 'Extens' then either scanning the tables used in the view & ordering in tempdb for matching or running the sub-query per row (in either case this could cause a lot of IO). The delete itself might be relatively inexpensive, dependig on how many rows it affects in the end. I suggest you add to the question at least the estimated and actual query plans, and preferably the view/table/index definitions too. – David Spillett Jul 18 '16 at 13:57

3 Answers3

5

I don't know fully details of your environment, i.e. if the tables in question are mostly used for writing or reading.

How often you do this delete?

what is the primary key and clustered index of [sko].[stage_närvaro]?

If I wanted to optimise this delete there are a few things I would consider:

1) an index on the underlying tables of the view ext.v_imp_närvaro with the columns used in the select (person_id, termin_fakta,[läsår_fakta]) you want an index seek there most likely (no need to include any columns because you are just going there for the EXISTS)

2) I have been using a lot filtered indexes and I would consider the following:

CREATE NONCLUSTERED INDEX IDXF_STAGE_NARVARO_FORSYSTEMS_EXTENS
ON [sko].[stage_närvaro] ([försystem])
INCLUDE ([person_id],[termin_fakta],[läsår_fakta])
WHERE [försystem] = 'Extens'

3) I can see some LCK_M_S going on there, not sure if related to this query in particular but nevertheless I try to always use ROWCOUNT and do big deletes and updates in batches, something like the following:

    USE DATABASENAME
    GO

    DECLARE @RC INT
    SELECT @RC = 0
    SET ROWCOUNT  5000

    WHILE (1 = 1)
      BEGIN
        BEGIN TRANSACTION

                        DELETE  sko.stage_närvaro
                        FROM    sko.stage_närvaro e
                        WHERE   försystem = 'Extens'
                        AND EXISTS (
                            SELECT  *
                            FROM    ext.v_imp_närvaro v
                            WHERE   e.person_id = v.person_id
                            AND     e.termin_fakta = v.termin_fakta
                            AND     e.läsår_fakta = v.läsår_fakta
                        )

                     SELECT @RC = @@ROWCOUNT

                     print CAST ( DB_NAME()  AS VARCHAR(500) ) + 
' -- ' + CAST ( @RC  AS VARCHAR(10) ) + ' -->  ' + 
CAST( GETDATE() AS VARCHAR(25))

                     WAITFOR DELAY '00:00:01';

        IF @RC = 0
          BEGIN
            COMMIT TRANSACTION

            BREAK
          END



        COMMIT TRANSACTION
      END

    SET ROWCOUNT  0




 --==============================================================
 --SET ROWCOUNT 10000 -- define maximum updated rows at once

-- DO THE UPDATE

-- don't forget about bellow 
-- after everything is updated
--SET ROWCOUNT 0

-- Setting ROWCOUNT to 0 turn off limits - don't forget about it.
--===============================================================

This may not be a comprehensive solution, because there are bits missing on the question too, however, it will surely give you some ideas as possible ways to improve big delete operations.

Marcello Miorelli
  • 16,170
  • 52
  • 163
  • 300
  • I ended up with a solution close to 2), creating a temp-table from the view which I put a non-clustered index on instead of indexing the underlying table of the view. – Marcus Jul 18 '16 at 14:23
  • I meant 1), not 2) – Marcus Jul 18 '16 at 14:42
2

It seems that you created view ext.v_imp_närvaro on the base table sko.stage_närvaro, if it is right then it will be good choice to create a temp table with qualifier rows of where clause and then execute delete statement on table and use temp table for check existent of rows..

SELECT person_id,termin_fakta,läsår_fakta INTO #Temp
FROM  [ext].[v_imp_närvaro]

CREATE NONCLUSTERED INDEX ON #Temp (person_id,termin_fakta,läsår_fakta)

DELETE  sko.stage_närvaro
FROM    sko.stage_närvaro e
WHERE   försystem = 'Extens'
AND EXISTS (
    SELECT  *
    FROM    #Temp v
    WHERE   e.person_id = v.person_id
    AND     e.termin_fakta = v.termin_fakta
    AND     e.läsår_fakta = v.läsår_fakta
)
Marcus
  • 135
  • 6
Tim
  • 36
  • 1
  • The view is not created on sko.stage_närvaro but I did use your solution which improved the execution time from taking 2 hours 30 minutes to 25 minutes. – Marcus Jul 18 '16 at 14:18
  • @Tim be very careful when using the tempDB in this way, specially with a high volume of data, as stated in this question. please don't do this in production, unless tested and proven effective. have a look at this question and the comments there: http://dba.stackexchange.com/questions/143719/how-to-find-out-what-processes-caused-the-autogrowth-of-a-database-or-a-databa – Marcello Miorelli Jul 18 '16 at 14:22
  • @marcellomiorelli Can you elaborate on why this could be a problem? If I do not use TempDB the database logs will grow instead, causing the drive on which they are located to fill up rather quick. It seems that TempDB handles the query better. – Marcus Jul 18 '16 at 14:41
  • 1
    @Tim you can tackle in 2 ways - 1 delete in batches and 2 if using simple recovery - use checkpoint or if in FULL recovery take T-Log backups. What you are doing above is putting everything into temp table and then deleting it in one shot which will be (depending on how much data there is to delete) heavy on tempdb and T-log since delete is a fully logged operation. – Kin Shah Jul 18 '16 at 14:54
  • I would use a non-clustered index on #temp and then use distinct on the select and order by person_id, termin_fakta, läsår_fakta – paparazzo Jul 18 '16 at 15:37
  • @Marcus, the bottom line is that since tempDB is shared across all databases and all connections in SQL Server, it might become a point of contention. just imagine you had thousands of queries making use of tempDB at the same time, while you do this delete. of course the best course of action highly depends on your particular circumstances, but at least they should be considered BEFORE you decide to put such a load on the tempDB. – Marcello Miorelli Jul 18 '16 at 15:44
  • @marcellomiorelli OK, I see the problem using TempDB in this manner. However, my delete is run during the night, when close to no other operations are run simultaneously. Is it safe to assume that during these circumstances, using TempDB like this is fine? – Marcus Jul 18 '16 at 16:00
  • @Marcus, it is difficult to tell without knowing the full pic. if it was my environment here, it would be a no no. We have 24/7 full processing and we need to create windows, for index rebuilds, backups, DBCC checkdb, all these processes are resource intensive. you need to carefully check when it would be best to run such a delete. whatever way you do, I would highly recommend the delete in batches though, due to the high volume of data. why take the chance?? Better to have a good night sleep without worrying about your DBs and servers. Just my 2 cents though... – Marcello Miorelli Jul 18 '16 at 16:25
  • @marcellomiorelli I read that I could create a new permanent table instead of a temp table, read all the rows that I want to keep into the new table, drop the old and rename the new. Would that be a possible solution aswell? – Marcus Jul 18 '16 at 16:30
  • @Marcus, in my personal opinion that would be an unnecessary overkill exposing the database to long transactions and possibly a schema locking. you did not mention another disk, so I assume it would be all on the same disk, so lots of I/O reading and writing on the same disk at the same time - shared locks, exclusive locks. you better test your ideas in a test system first :) – Marcello Miorelli Jul 18 '16 at 16:38
0

Delete does not always use the same indexes as a select but optimize the select will typically help

try this

select count(*) 
FROM   sko.stage_närvaro e
JOIN   ext.v_imp_närvaro v
  ON   e.person_id    = v.person_id
 AND   e.termin_fakta = v.termin_fakta
 AND   e.läsår_fakta  = v.läsår_fakta
 AND   e.försystem = 'Extens'

an index on försystem should help

This index on the view should help
person_id, termin_fakta, läsår_fakta
Do you need to use the view?
Try going straight to the tables.
You may be doing stuff in the view that you don't really need for the delete.

Optimize the select then try it on the delete

But the index on [sko].[stage_närvaro] may help the select but hurt the delete. Index adds overhead to the delete.

paparazzo
  • 5,043
  • 1
  • 18
  • 32