1

In PostgreSQL (version 9.4) I am trying to construct a query to determine in various tranches how many times various numbers appear in the data set between ranges. When I query group-ing by "SettlementPointPrice" the count() function works correctly bucketing prices into individual tranches as I would expect. However, this creates hundreds of rows. I am looking for the ability (sub-query?) to aggregate the sum of count()'s in each bucket into one single row. What is the best way to manage this in SQL?

I am using a statement like this (full SQL below) for each bucket/tranche:

CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 0 AND 10
     THEN count(*) ELSE 0 END AS "DA $0 - $10",

When I group by "SettlementPointPrice" (two prices in this example) to confirm the data counts the buckets correctly. As the table below shows.

Raw data from individual two days:

Row | "SettlementPointPrice" | 0-10 | 11-20 | 21-30
1   | 18                     | 0    | 1     | 0
2   | 22                     | 0    | 0     | 1

However, I am unable to get the aggregated summation by grouping them all together. I assume this is a sub-query?

I would like the result to be as such:

Row | 0-10 | 11-20 | 21-30
1   | 0    | 1     | 1

Full SQL code:

SELECT
  "DA-A"."SettlementPointPrice",
  CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 0 AND 10
       THEN count(*) ELSE 0 END AS "DA $0 - $10",
  CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 11 AND 20
       THEN COUNT(*) ELSE 0 END AS "DA $11 - $20",
  CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 21 AND 30
       THEN COUNT(*) ELSE 0 END AS "DA $21 - $30"
FROM 
  public.da "DA-A", 
  public.rt_aggregate "RT-A"
WHERE 
  "RT-A"."DeliveryDate" = "DA-A"."DeliveryDate" AND
  "RT-A"."SettlementPointName" = "DA-A"."SettlementPointName" AND
  "DA-A"."SettlementPointName" = 'John' AND 
  "DA-A"."DeliveryDate" >= '2015-02-01' AND
  "DA-A"."DeliveryDate" <= '2015-02-20' AND
 ("RT-A"."DeliveryHour" = 14) and 
  date_part('hour', "DA-A"."DeliveryHour") = "RT-A"."DeliveryHour"
GROUP BY
  "DA-A"."SettlementPointPrice",
  "DA-A"."SettlementPointName"
Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
JAS
  • 115
  • 1
  • 4
  • As always your version of Postgres, please. And the name is PostgreSQL or Postgres for short. Never "Postgre". – Erwin Brandstetter Feb 26 '15 at 04:10
  • When you write grouping them all together, do you mean all selected rows, so that you get a single row aggregating all counts? Your column "Row" is a bit confusing, that's just noise, right? And it's unclear where the column Source DA in the result comes from. Does not agree with the query you show. – Erwin Brandstetter Feb 26 '15 at 04:27
  • EB: Yes, I mean aggregating all the rows and the count values into a single value. For example say there are ten rows (ten days) I just want the summation of those ten days sum(count(*)) as an example by bucket. – JAS Feb 26 '15 at 04:36
  • EB: "Row" is just just noise, but also highlights how I am getting two rows now, but really just want one single row with everything as a sum in each bucket. – JAS Feb 26 '15 at 04:36
  • EB: Sorry, Source DA should read: "SettlementPointPrice". Fixed. – JAS Feb 26 '15 at 04:38

1 Answers1

0

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count(                                     d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
     , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
     , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."SettlementPointName" = 'John'
AND    d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
     , count(*) FILTER (WHERE d."SettlementPointPrice" <  10.5) AS da_00_10
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
                        AND   d."SettlementPointPrice" <  20.5) AS da_11_20
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
                        AND   d."SettlementPointPrice" <  30.5) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP  BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • @JAS: Consider the added notes. – Erwin Brandstetter Feb 26 '15 at 04:48
  • One additional question to your code. If remove the WHERE d."SettlementPointName" = 'John' and want a GROUP BY d."SettlementPointName" how is that done through the sub select? – JAS Feb 26 '15 at 04:53
  • Your question is a bit light on your actual objective. My guess is you just need the second query, where you can just add GROUP BY d."SettlementPointName" to get one row with counts *per name*. No subquery. – Erwin Brandstetter Feb 26 '15 at 04:56
  • EB: Your second update with additional SQL solved what I was looking for: – JAS Feb 26 '15 at 04:58