Need a way to query a table, and JOIN it with the TOP 1 related record from an other table

Question

As a follow up to this question, I'm wondering if there is a better and/or more efficient way to gather the data in question.

As stated, this query does return me 95% of the data I need --

SELECT dv.Name
      ,MAX(hb.[DateEntered]) as DE
FROM 
    [Devices] as dv
    INNER JOIN 
    [Heartbeats] as hb ON hb.DeviceID = dv.ID
GROUP BY dv.Name
HAVING MAX(hb.[DateEntered]) < '20130304';

Is there a way to achieve a the same result (for each Device, select the TOP Heartbeat ordered DESC by DateEntered) but also select the entire row from the [Heartbeats] table? Right now, I only get the DateTime for that row.

If I include the additional columns in the GROUP BY clause, I can then add them to the select; but then I get multiple rows per [Devices] row which I don't want. It sounds odd, but what I basically want to do is do a query against [Devices] and then do a for...each over that set and add the top [Heartbeats] row for that [Devices] row. Is that possible?

update This is the structure of the Heartbeats table:

CREATE TABLE [dbo].[Heartbeats](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [DeviceID] [int] NOT NULL,
    [IsFMSFMPUp] [bit] NOT NULL,
    [IsFMSWebUp] [bit] NOT NULL,
    [IsPingUp] [bit] NOT NULL,
    [DateEntered] [datetime] NOT NULL,
 CONSTRAINT [PK_Heartbeats] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [CommonQueryIndex] ON [dbo].[Heartbeats] 
(
    [DateEntered] ASC,
    [DeviceID] ASC
)
INCLUDE ( [ID],
[IsFMSFMPUp],
[IsFMSWebUp],
[IsPingUp]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [HeartbeatDeviceIndex] ON [dbo].[Heartbeats] 
(
    [DeviceID] ASC
)
INCLUDE ( [ID]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
/****** Object:  Default [DF_Heartbeats_DateEntered]    Script Date: 03/05/2013 10:45:45 ******/
ALTER TABLE [dbo].[Heartbeats] ADD  CONSTRAINT [DF_Heartbeats_DateEntered]  DEFAULT (getdate()) FOR [DateEntered]
GO
/****** Object:  ForeignKey [FK_Heartbeats_Devices]    Script Date: 03/05/2013 10:45:45 ******/
ALTER TABLE [dbo].[Heartbeats]  WITH CHECK ADD  CONSTRAINT [FK_Heartbeats_Devices] FOREIGN KEY([DeviceID])
REFERENCES [dbo].[Devices] ([ID])
GO
ALTER TABLE [dbo].[Heartbeats] CHECK CONSTRAINT [FK_Heartbeats_Devices]
GO

score 6 · Accepted Answer · answered Mar 05 '13 at 16:07

6

You can do this pretty easily with OUTER APPLY (if you're on 2005 or newer). Note that there may be better performing ways of achieving the result, such as using ROW_NUMBER() - check execution plans if in doubt. Also, SELECT * is lazy and inadvisable; I'm just doing it here for illustrative purposes, and because I don't know the real structure of the Heartbeats table.

SELECT
    dv.Name,
    hb.*
FROM [Devices] as dv
    OUTER APPLY (
        SELECT TOP 1 *
        FROM Heartbeats
        WHERE Heartbeats.DeviceID = dv.ID
        ORDER BY DateEntered DESC
    ) hb
WHERE ISNULL(hb.DateEntered, '1900-01-01T00:00') < '2013-03-04T00:00'

See Books Online for the finer points of OUTER APPLY vs. CROSS APPLY (it's much like OUTER JOIN vs. INNER JOIN). It was always such a pain doing queries like this in SQL Server 2000 where you didn't have OUTER/CROSS APPLY or the ROW_NUMBER() function.

answered Mar 05 '13 at 16:07

db2

9,658
3
34
58

Can you explain WHERE ISNULL(hb.DateEntered, '1900-01-01T00:00') < '2013-03-04T00:00' a bit for me? – Nate Mar 05 '13 at 16:13
Sure. If a device has no heartbeat records, then OUTER APPLY will return DateEntered as NULL, which has to be accounted for. You could also do WHERE (hb.DateEntered < '2013-03-04T00:00' OR hb.DateEntered IS NULL) to get the same results. – db2 Mar 05 '13 at 16:19
So this accounts for the case where a device does not have a heartbeat, and would be omitted completely, this includes it without the associated heartbeat? – Nate Mar 05 '13 at 16:22
@Nate Yup, that's correct. If you want to ignore machines that have never had a heartbeat, you can change the query to use CROSS APPLY instead. This makes it behave like INNER JOIN; if the subquery returns no rows, the parent row is omitted from the results. – db2 Mar 05 '13 at 16:28
Makes sense, thanks! So, this query has been running for 20 minutes and not yet returned, the original query in my question runs in about 20 seconds, as does the one posted below using the "self-join" -- any thoughts on why that is? "Estimated" execution plan shows its using the same index as the other queries. (Heartbeats table has 40+ million rows, Devices has a few hundred) – Nate Mar 05 '13 at 16:34
@Nate That does seem a little excessive. Does Heartbeats have an index defined on (DeviceID, DateEntered)? – db2 Mar 05 '13 at 16:41
It has two non-clustered, one is it is (DateEntered, DeviceID) the other is just (DeviceID) and the clustered index is on ID (the PK). Should I create one with (DeviceID, DateEntered)? – Nate Mar 05 '13 at 16:42
@Nate: You should post your table structure (CREATE TABLE) and indexes, if efficiency is an issue, too. You should probably have an index on (DeviceID, DateEntered DESC) but it would be good if you added that info in your question. – ypercubeᵀᴹ Mar 05 '13 at 16:43
I created the index (DeviceID, DateEntered DESC) and it didn't improve the query; I had to modify the query to WITH(INDEX(MyNewIndex)) in order to use it; but now it is very fast. – Nate Mar 05 '13 at 17:08

score 0 · Answer 2 · answered Mar 05 '13 at 15:55

You could try using this type of a "self-join":

select dat.*
from 
(SELECT dv.Name,MAX(hb.[DateEntered]) as DE
FROM [Devices] as dv
    INNER JOIN [Heartbeats] as hb ON hb.DeviceID = dv.ID
GROUP BY dv.Name
HAVING MAX(hb.[DateEntered]) < '20130304') grp 
    inner join 
(SELECT dv.Name, hb.*
FROM [Devices] as dv
    INNER JOIN [Heartbeats] as hb ON hb.DeviceID = dv.ID) dat 
        on grp.name = dat.name and grp.DE = dat.[DateEntered]

Need a way to query a table, and JOIN it with the TOP 1 related record from an other table

2 Answers2