Is the jsonb_column @> '{"key":value}'::jsonb operator equal to jsonb_column->'key' = value? in terms of result, performance, and indexes that will be used?
- 1,363
- 6
- 17
- 34
1 Answers
I have created a small test for checking this:
CREATE TABLE jsonb_test (id serial, data jsonb);
Then inserted some generated data:
INSERT INTO jsonb_test (data)
SELECT row_to_json(x.*)::jsonb
FROM (SELECT i AS bla,
ARRAY[i, i+1, i+2] AS foo,
ARRAY[('v' || i::text, i), ('v' || (i * 10)::text, i)] AS bar
FROM generate_series(1,100000) t(i)
) x;
Let's find something in there (version 1):
SELECT * FROM jsonb_test WHERE data @> '{"bla": 545}'::jsonb;
The execution plan looks like
Seq Scan on jsonb_test (cost=0.00..7604.89 rows=369 width=36) (actual time=0.425..47.600 rows=1 loops=1)
Filter: (data @> '{"bla": 545}'::jsonb)
Rows Removed by Filter: 99999
Buffers: shared hit=2997 dirtied=676
Planning time: 0.078 ms
Execution time: 47.692 ms
If I do the same with the other version (version 2) you propose (with the operator changed from -> to ->> as I think you meant that):
SELECT * FROM jsonb_test WHERE data ->> 'bla' = '545';
I get a plan like
Seq Scan on jsonb_test (cost=0.00..8526.47 rows=1843 width=36) (actual time=0.613..34.657 rows=1 loops=1)
Filter: ((data ->> 'bla'::text) = '545'::text)
Rows Removed by Filter: 99999
Buffers: shared hit=2997
Planning time: 0.116 ms
Execution time: 34.727 ms
Running them repeatedly, I see no significant difference between the execution times. It is no wonder - the expensive operation here is the sequential scan, while the two different filter conditions should't be very different in this regard.
Now I add a GIN index on data:
CREATE INDEX ON jsonb_test USING gin (data);
When executing the queries now, version 2 does the same as above, while version 1 has a new plan:
Bitmap Heap Scan on jsonb_test (cost=28.77..372.05 rows=100 width=36) (actual time=0.081..0.083 rows=1 loops=1)
Recheck Cond: (data @> '{"bla": 545}'::jsonb)
Rows Removed by Index Recheck: 2
Heap Blocks: exact=1
Buffers: shared hit=10
-> Bitmap Index Scan on jsonb_test_data_idx (cost=0.00..28.75 rows=100 width=0) (actual time=0.062..0.062 rows=3 loops=1)
Index Cond: (data @> '{"bla": 545}'::jsonb)
Buffers: shared hit=9
Planning time: 0.084 ms
Execution time: 0.122 ms
This already shows that the two versions are not fully eqivalent - a GIN index on the jsonb column supports only one of them (version 1).
One can define an other index to help version 2:
CREATE INDEX ON jsonb_test ((data->>'bla'));
Now version 2 gives the following:
Bitmap Heap Scan on jsonb_test (cost=12.17..1323.49 rows=500 width=36) (actual time=0.065..0.070 rows=1 loops=1)
Recheck Cond: ((data ->> 'bla'::text) = '545'::text)
Heap Blocks: exact=1
Buffers: shared hit=1 read=2
-> Bitmap Index Scan on jsonb_test_expr_idx (cost=0.00..12.04 rows=500 width=0) (actual time=0.046..0.046 rows=1 loops=1)
Index Cond: ((data ->> 'bla'::text) = '545'::text)
Buffers: shared read=2
Planning time: 0.480 ms
Execution time: 0.133 ms
In this experiment, I don't see a huge performance difference between the two, given proper indexing. However, you should consider the following:
Insertion into a GIN index can be slow due to the likelihood of many keys being inserted for each item. So, for bulk insertions into a table it is advisable to drop the GIN index and recreate it after finishing bulk insertion.
Usually it is said that BTree indexes are cheaper to maintain (meaning data changes on the table have a smaller overhead) that GIN indexes. From reading around, this is not always the case - you have to test these solutions on your own data set. For example, I cannot tell off the top of my head if a set of very complex jsonb values will prefer this or that.
Notes:
- to understand the similarities and differences of the
json(b)operators, the official documentation is an excellent source. - it might make sense reading Bruce Momjian's presentation about indexing techniques.
- 31,278
- 13
- 101
- 147