1

I have this table:

CREATE TABLE spp.rtprices (
  "interval" timestamp without time zone NOT NULL,
  rtlmp numeric(12,6),
  rtmcc numeric(12,6),
  rtmcl numeric(12,6),
  node_id integer NOT NULL,
  CONSTRAINT rtprices_pkey PRIMARY KEY ("interval", node_id),
  CONSTRAINT rtprices_node_id_fkey FOREIGN KEY (node_id)
      REFERENCES spp.nodes (node_id) MATCH SIMPLE
      ON UPDATE RESTRICT ON DELETE RESTRICT
)

And one more relevant index:

CREATE INDEX rtprices_node_id_interval_idx ON spp.rtprices (node_id, "interval");

Against it I've made this view:

  CREATE OR REPLACE VIEW spp.rtprices_hourly AS 
  SELECT (rtprices."interval" - '00:05:00'::interval)::date::timestamp without time zone AS pricedate,
  date_part('hour'::text, date_trunc('hour'::text, rtprices."interval" - '00:05:00'::interval))::integer + 1 AS hour,
  rtprices.node_id,
  round(avg(rtprices.rtlmp), 2) AS rtlmp,
  round(avg(rtprices.rtmcc), 2) AS rtmcc,
  round(avg(rtprices.rtmcl), 2) AS rtmcl
  FROM spp.rtprices
  GROUP BY date_part('hour'::text, date_trunc('hour'::text, rtprices."interval" - '00:05:00'::interval))::integer + 1,
           rtprices.node_id,
           (rtprices."interval" - '00:05:00'::interval)::date::timestamp without time zone;

The point of which is to give averages of the numeric columns for each hour (The timestamps have data every 5 minutes). The problem is that a query for a single day for a single node_id takes upwards of 30 seconds for 24 records.

explain analyze select * from spp.rtprices_hourly
where node_id=20 and pricedate='2015-02-02'

Returns this:

   "HashAggregate  (cost=1128767.71..1128773.79 rows=135 width=28) (actual time=31155.023..31155.065 rows=24 loops=1)"
"  Group Key: ((date_part('hour'::text, date_trunc('hour'::text, (rtprices."interval" - '00:05:00'::interval))))::integer + 1), rtprices.node_id, (((rtprices."interval" - '00:05:00'::interval))::date)::timestamp without time zone"
"  ->  Bitmap Heap Scan on rtprices  (cost=10629.42..1128732.91 rows=2320 width=28) (actual time=25071.410..31153.715 rows=288 loops=1)"
"        Recheck Cond: (node_id = 20)"
"        Rows Removed by Index Recheck: 7142233"
"        Filter: (((("interval" - '00:05:00'::interval))::date)::timestamp without time zone = '2015-02-02 00:00:00'::timestamp without time zone)"
"        Rows Removed by Filter: 124909"
"        Heap Blocks: exact=43076 lossy=82085"
"        ->  Bitmap Index Scan on rtprices_node_id_interval_idx  (cost=0.00..10628.84 rows=464036 width=0) (actual time=68.999..68.999 rows=125197 loops=1)"
"              Index Cond: (node_id = 20)"
"Planning time: 5.243 ms"
"Execution time: 31155.392 ms"
Dean MacGregor
  • 699
  • 1
  • 8
  • 21
  • rtprices_node_id_interval_idx does not seem to be the underlying index of the PK on ("interval", node_id)? Please also provide the definition of all relevant indexes on the table. The order of index columns is important. – Erwin Brandstetter May 19 '15 at 21:24
  • @ErwinBrandstetter I added the missing index. – Dean MacGregor May 19 '15 at 22:18
  • Why would you use the reserved word (in standard SQL) interval as column name for a timestamp? That's backwards twice. 2. Why do you subtract 5 minutes from interval before you aggregate? There is nothing in the explanation for this odd move.
  • – Erwin Brandstetter May 20 '15 at 00:20
  • @ErwinBrandstetter 1. I brain farted the column name because that's what the data source called it. I can/will change that. 2. I subtract 5 minutes because I the average from x:05 to x+1:00 so I subtract 5 minutes so that when I truncate off the minutes all those times become the same value. – Dean MacGregor May 20 '15 at 00:34