MIN function returning multiple rows after adding column to SELECT

Question

I'm trying to get the minimum value of Price together with product Description in the table ProductVariation.

When creating my query this way:

SELECT productId, MIN([price]) AS price
FROM ProductVariation
GROUP BY ProductId

It works as expected, grouping by ProductId and minimum value of Price:

ProductId	Price
3	299.99
6	299.99
7	299.99
8	199.99
9	299.99
10	299.99
11	159.99
12	159.99
13	189.99
14	189.99

However, when I add Description column to the SELECT:

SELECT productId, [Description], MIN([price]) AS price
FROM ProductVariation
GROUP BY ProductId, [Description]

It shows all rows, repeting ProductId column.

ProductId	Description	Price
3	Natural	299.99
6	Cookies & Cream	299.99
7	Morango	299.99
8	Baunilha	199.99
8	Chocolate	299.99
9	Caramelo	299.99
10	Baunilha	299.99
10	Cookies & Cream	399.99
10	Morango	399.99
11	Baunilha	199.99
11	Morango	159.99

How can I fix it to show unique ProductId values together with its Description like this?

ProductId	Description	Price
3	Natural	299.99
6	Cookies & Cream	299.99
7	Morango	299.99
8	Baunilha	199.99
9	Caramelo	299.99
10	Baunilha	299.99
11	Morango	159.99
12	Baunilha	159.99
13	Caramelo	189.99
14	Morango	189.99

And if you could explain why it's not working my way?

J.D. · Accepted Answer · 2022-11-26T14:50:07.260

You can use a window function like ROW_NUMBER() to enumerate the rows in groupings (partitions) of the ProductId, sorted by the Price ascending, and then SELECT only the minimum Price row for each group.

For example:

WITH _ProductVariationsSorted AS
(
    SELECT 
        ProductId, 
        Description,
        Price,
        ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY Price) AS SortId -- Generate a unique ID within each group of ProductId rows, sorted by the Price
    FROM dbo.ProductVariation
)
SELECT 
    ProductId, 
    Description,
    Price
FROM _ProductVariationsSorted
WHERE SortId = 1; -- Only return the rows with the lowest Price per ProductId grouping

Another approach (shoutout to Erik Darling for reminding me) is just straight up windowing the MIN() aggregate function. You'll also need to DISTINCT the rows. You may find this simplifies the code like so:

SELECT DISTINCT
    ProductId, 
    Description,
    MIN(Price) OVER (PARTITION BY ProductId) AS Price
FROM dbo.ProductVariation

And finally a third but similar approach is using the FIRST_VALUE() window function as well:

SELECT DISTINCT
    ProductId, 
    Description,
    FIRST_VALUE(Price) OVER (PARTITION BY ID ORDER BY PRICE) AS Price
FROM dbo.ProductVariation

Though depending on your data size, and how your tables and indexes are architected, you may find one of these solutions more performant than the other. Always test accordingly.

One other important thing to note, is when using window functions, if there are ties for the value in the column you're windowing (within the same grouping), different window functions will handle those ties differently.

The ROW_NUMBER() function will nondeterministically choose one of the rows with the tie at random, when ordering by that tied value, to enumerate the rows. Then one of those ties will get filtered out further down when filtering on WHERE SortId = 1. The solution using DISTINCT and either the SUM() OVER or FIRST_VALUE() window functions, will include both tied rows.

For example, in your data, it looks like for ProductId = 10 there are two rows with the same Price = 399.99, Cookies & Cream and Morango. Now this doesn't actually pose a problem in your data at the moment, because there's another row for ProductId = 10 with a lower Price.

But let's pretend one day that variation is no longer sold and the row gets deleted. Now your lowest Price for ProductId = 10 is 399.99 which is a tie between two variations. With the ROW_NUMBER() solution you'd only get one of those rows back at random, every time you ran the query. With the SUM() OVER or FIRST_VALUE() solutions, you'd get both rows back.

Depending on your goals, neither of these outcomes may be what you want. I.e. if you truly only wanted to show only 1 variation at a time, even when there's a tie in Price within the same ProductId, and you wanted it to reliably always be the same variation every time you ran the query.

To achieve this, you'd have to add a deterministic tie-breaker expression to the ORDER BY clause of the ROW_NUMBER() window function, for example, a uniquifying column such as the Id column. The code would then look like this: ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY Price, Id) AS SortId. And the rules of which row to return are now the one with the lowest Price, and in a tie for lowest Price then the row in the tie that has the lowest Id (theoretically was entered into the ProductVariations table first).

@gregoryp you should post a new question with appropriate details, and that'll maximize the chances of getting the best help. — J.D., Jun 05 '23 at 12:15

MIN function returning multiple rows after adding column to SELECT

1 Answers1

Linked