I have this table in PostgreSQL 9.4:
CREATE TABLE user_operations(
id SERIAL PRIMARY KEY,
operation_id integer,
user_id integer )
The table consists of ~1000-2000 different operations each of which corresponds to some subset (consisting of approximately 80000-120000 elements each) of the set S of all users:
S = {1, 2, 3, ... , 122655}
Parameters:
work_mem = 128MB
table_size = 880MB
I also have an index on the operation_id.
QUESTION: What would be the most optimal plan for querying all distinct user_id for a significant part of the operation_id set (20%-60%) like:
SELECT DISTINCT user_id FROM user_operation WHERE operation_id < 500
It's possible to create more indexes on the table. Currently, the plan for the query is:
HashAggregate (cost=196173.56..196347.14 rows=17358 width=4) (actual time=1227.408..1359.947 rows=598336 loops=1)
-> Bitmap Heap Scan on user_operation (cost=46392.24..189978.17 rows=2478155 width=4) (actual time=233.163..611.182 rows=2518122 loops=1)
Recheck Cond: (operation_id < 500)
-> Bitmap Index Scan on idx (cost=0.00..45772.70 rows=2478155 width=0) (actual time=230.432..230.432 rows=2518122 loops=1)
Index Cond: (operation_id < 500)
Is such query plan really optimal in such circumstances? I mean, I'm not sure about correctness of using Bitmap Heap Scan. I'll appreciate any references to relevant articles.
EXPLAIN (ANALYZE, BUFFERS)could possibly tell a bit more about the optimalness of your plan. What is you expectiation (time-wise)? There are chances that an index on(operation_id, user_id)would result in a somewhat more efficient query. – András Váczi Oct 21 '15 at 08:41ANALYZEthe table? Also, afterSET enable_bitmapscan TO off;, does it show up in the plan? Is it worse or better? – András Váczi Oct 21 '15 at 09:00ANALYZEon the able after creating the index. About, Hmmm, maybe. But I really don't see how that index may help the query, we don't restrict it byuser_id... – St.Antario Oct 21 '15 at 09:06DISTINCTon it. My idea is that a single index-only scan would be enough to collect the data for the aggregation - lowering the I/O needs. – András Váczi Oct 21 '15 at 09:22