9

I have a database structure similar to this,

CREATE TABLE [dbo].[Dispatch](
    [DispatchId] [int] NOT NULL,
    [ContractId] [int] NOT NULL,
    [DispatchDescription] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Dispatch] PRIMARY KEY CLUSTERED 
(
    [DispatchId] ASC,
    [ContractId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[DispatchLink](
    [ContractLink1] [int] NOT NULL,
    [DispatchLink1] [int] NOT NULL,
    [ContractLink2] [int] NOT NULL,
    [DispatchLink2] [int] NOT NULL
) ON [PRIMARY]

GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (1, 1, N'Test')
GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (2, 1, N'Test')
GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (3, 1, N'Test')
GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (4, 1, N'Test')
GO
INSERT [dbo].[DispatchLink] ([ContractLink1], [DispatchLink1], [ContractLink2], [DispatchLink2]) VALUES (1, 1, 1, 2)
GO
INSERT [dbo].[DispatchLink] ([ContractLink1], [DispatchLink1], [ContractLink2], [DispatchLink2]) VALUES (1, 1, 1, 3)
GO
INSERT [dbo].[DispatchLink] ([ContractLink1], [DispatchLink1], [ContractLink2], [DispatchLink2]) VALUES (1, 3, 1, 2)
GO

The point of the DispatchLink table is to link two Dispatch records together. By the way I am using a composite primary key on my dispatch table because of legacy, so I cannot change that without a lot of pain. Also the link table may not be the correct way to do it? But again legacy.

So my question, if I run this query

select * from Dispatch d
inner join DispatchLink dl on d.DispatchId = dl.DispatchLink1 and d.ContractId = dl.ContractLink1
or d.DispatchId = dl.DispatchLink2 and d.ContractId = dl.ContractLink2

I can never get it to do an index seek on the DispatchLink table. It always does a full index scan. That is fine with a few records, but when you have 50000 in that table it scans 50000 records in the index according to the query plan. It is because there are 'ands' and 'ors' in the join clause, but I can't get my head around why SQL can't do a couple of index seeks instead, one for the left side of the 'or', and one for the right side of the 'or'.

I would like an explanation for this, not a suggestion to make the query faster unless that can be done without adjusting the query. The reason is that I am using the above query as a merge replication join filter, so I cannot just add in another type of query unfortunately.

UPDATE: For instance these are the types of indexes I have been adding,

CREATE NONCLUSTERED INDEX IDX1 ON DispatchLink (ContractLink1, DispatchLink1)
CREATE NONCLUSTERED INDEX IDX2 ON DispatchLink (ContractLink2, DispatchLink2)
CREATE NONCLUSTERED INDEX IDX3 ON DispatchLink (ContractLink1, DispatchLink1, ContractLink2, DispatchLink2)

So it uses the indexes, but does an index scan across the whole index, so 50000 records it scans 50000 records in the index.

peter
  • 2,187
  • 5
  • 19
  • 38
  • In your query: "select * from Dispatch d inner join DispatchLink dl on d.DispatchId = dl.DispatchLink1 and d.ContractId = dl.ContractLink1 or d.DispatchId = dl.DispatchLink2 and d.ContractId = dl.ContractLink2" try to remove the "OR" condition and replace it by UNION of 2 SELECT statements each using no "OR", also use the only key columns in both SELECTs instead of the "*", just to make the test as pure as possible. – NoChance Sep 05 '12 at 23:59
  • Thanks SQL Kiwi, this is something I previously tried but it didn't work unfortunately. – peter Sep 06 '12 at 00:08
  • Emmad Kareem are you suggesting to not use an inner join? I'm not really following you sorry. – peter Sep 06 '12 at 00:09
  • The inner join logic is fine, however, it can be written using the old style WHERE condition to avoid the "OR" operator and hence improve the possibility for utilizing the indexes. See section "Rewriting query with OR conditions as a UNION" here: http://www.componentace.com/help/absdb_manual/increasesqlperformance.htm – NoChance Sep 06 '12 at 01:08
  • Sure so you mean two separate queries. Unfortunately I am working with merge replication and this query represents a join clause. This is basically what merge replication is doing behind the scenes so I cannot change it. I can understand that the reason is the 'or' but I cannot understand why it can't do two index seeks for each side of the 'or'. – peter Sep 06 '12 at 02:07
  • 1
    Can you have replication issue a simpler query: select * from Dispatch d inner join DispatchLink dl on d.DispatchId = dl.DispatchLink1 and d.ContractId = dl.ContractLink1

    If yes, we can duplicate data in DispatchLink so that the results are still valid ...

    – A-K Sep 06 '12 at 03:26
  • AlexKuznetsov, thanks. I can do what you suggest as a work around for the moment, even without duplicating data. It just means that some records in the link table might be replicated when they don't have to be. This can be handled in the client software. Long term removing the repeating columns is the answer, but not a straightforward fix. Lots of software to change. – peter Sep 06 '12 at 04:33

1 Answers1

12

The optimizer can consider many plan alternatives (including ones with multiple seeks) but for disjunctions (OR predicates) it does not consider plans involving index intersections by default. Given the indexes:

CREATE CLUSTERED INDEX cx 
ON dbo.DispatchLink (DispatchLink1, ContractLink1);

CREATE NONCLUSTERED INDEX nc1 
ON dbo.DispatchLink (DispatchLink2, ContractLink2);

We can force index seeks (assuming SQL Server 2008 or later):

SELECT * 
FROM dbo.Dispatch AS d
INNER JOIN dbo.DispatchLink AS dl WITH (FORCESEEK) ON 
    (d.DispatchId = dl.DispatchLink1 AND d.ContractId = dl.ContractLink1)
    OR (d.DispatchId = dl.DispatchLink2 AND d.ContractId = dl.ContractLink2);

FORCESEEK plan

Using your sample data, the seek plan costs at 0.0332551 units compared with 0.0068057 for the scan plan:

Scan plan

There are all sorts of possible query rewrites and hints we can try. One example of a rewrite to promote an option the optimizer does not consider for the original plan is:

SELECT * 
FROM dbo.Dispatch AS d
CROSS APPLY
(
    SELECT TOP (1) * FROM
    (
        SELECT * FROM dbo.DispatchLink AS dl
        WHERE dl.DispatchLink1 = d.DispatchId
        AND dl.ContractLink1 = d.ContractId
        UNION ALL
        SELECT * FROM dbo.DispatchLink AS dl
        WHERE dl.DispatchLink2 = d.DispatchId
        AND dl.ContractLink2 = d.ContractId
    ) SQ1
) AS F1;

This execution plan does not seek the second index if it finds a match on the first:

APPLY TOP Plan

This may perform very slightly better than the default FORCESEEK plan.

Without adding any new indexes, we can also force a seek into the Dispatch table:

SELECT * 
FROM dbo.DispatchLink AS dl
JOIN dbo.Dispatch AS d WITH (FORCESEEK) ON
    (d.DispatchId = dl.DispatchLink1 AND d.ContractId = dl.ContractLink1)
    OR (d.DispatchId = dl.DispatchLink2 AND d.ContractId = dl.ContractLink2);

Seek 2

This may be better or worse than the first example depending on things like how many rows are in each of the tables. The APPLY + TOP improvement is still possible:

SELECT * 
FROM dbo.DispatchLink AS dl
CROSS APPLY
(
    SELECT TOP (1) * FROM
    (
        SELECT * FROM dbo.Dispatch AS d
        WHERE dl.DispatchLink1 = d.DispatchId
        AND dl.ContractLink1 = d.ContractId
        UNION ALL
        SELECT * FROM dbo.Dispatch AS d
        WHERE dl.DispatchLink2 = d.DispatchId
        AND dl.ContractLink2 = d.ContractId
    ) SQ1
) AS F1;
Paul White
  • 83,961
  • 28
  • 402
  • 634
  • That's a very useful answer. I have asked another question http://dba.stackexchange.com/questions/23773/analysing-a-query-plan which shows the actual query plan on real data (not my test data). I don't have the knowledge to understand exactly what the bottleneck on the query plan is. Perhaps you can take a look? – peter Sep 06 '12 at 02:38