7

I'm attempting to use JSONB with JDBC, which means that I have to avoid any of the operators which use the '?' character (as the PostgreSQL JDBC driver has no escaping for this character). Taking a simple table:

CREATE TABLE jsonthings(d JSONB NOT NULL);
INSERT INTO jsonthings VALUES
    ('{"name":"First","tags":["foo"]}')
  , ('{"name":"Second","tags":["foo","bar"]}')
  , ('{"name":"Third","tags":["bar","baz"]}')
  , ('{"name":"Fourth","tags":["baz"]}');
CREATE INDEX idx_jsonthings_name ON jsonthings USING GIN ((d->'name'));

Using the command line I can run a simple select and it uses the index as expected:

EXPLAIN ANALYZE SELECT d FROM jsonthings WHERE d->'name' ? 'First';

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on jsonthings  (cost=113.50..30236.13 rows=10000 width=61) (actual time=0.024..0.025 rows=1 loops=1)
   Recheck Cond: ((d -> 'name'::text) ? 'First'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_jsonthings_name  (cost=0.00..111.00 rows=10000 width=0) (actual time=0.015..0.015 rows=1 loops=1)
         Index Cond: ((d -> 'name'::text) ? 'First'::text)
 Planning time: 0.073 ms
 Execution time: 0.047 ms
(7 rows)

Since I cannot use the ? character I resorted to using a function which underpins the ? operator. However, it is not using the index:

EXPLAIN ANALYZE SELECT d FROM jsonthings WHERE jsonb_exists(d->'name','First');
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on jsonthings  (cost=10000000000.00..10000263637.06 rows=3333334 width=61) (actual time=0.016..3135.119 rows=1 loops=1)
   Filter: jsonb_exists((d -> 'name'::text), 'First'::text)
   Rows Removed by Filter: 10000003
 Planning time: 0.051 ms
 Execution time: 3135.138 ms
(5 rows)

Why is this happening, and what can I do to make the function use the index? Note that in reality the table has another 10MM rows in it and I also have enable_seqscan turned off so this isn't a case of the planner deciding not to use the index.

In response to a comment I tried using a custom operator instead:

CREATE OPERATOR ### (
  PROCEDURE = jsonb_exists,
  LEFTARG = jsonb,
  RIGHTARG = text,
  RESTRICT = contsel,
  JOIN = contjoinsel);

But that has the same problem:

EXPLAIN ANALYZE SELECT d FROM jsonthings WHERE d->'name' ### 'First';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on jsonthings  (cost=10000000000.00..10000263637.06 rows=10000 width=61) (actual time=0.012..3381.608 rows=1 loops=1)
   Filter: ((d -> 'name'::text) ### 'First'::text)
   Rows Removed by Filter: 10000003
 Planning time: 0.046 ms
 Execution time: 3381.623 ms
(5 rows)

Update

The latest PostgreSql driver (as of March 2015) has the ability to escape the ? character so this specific case is no longer a problem.

jgm
  • 305
  • 1
  • 4
  • 10
  • http://stackoverflow.com/a/14786469/246260 about how to get around the limitation of question marks (basically define an other operator name). However, this is really interesting - apparently, the planner is not aware that it can optimize the latter query just as it does with the first one (see cost estimates). – András Váczi Jan 23 '15 at 15:02
  • Thanks for the comment. Unfortunately though the planner doesn't seem to use indexes for custom operators either. I'll update the question accordingly. – jgm Jan 23 '15 at 15:44
  • 1
    Do you have to use a PreparedStatement? As long as you don't use that, you can use a ? without any problems. –  Jan 23 '15 at 16:27
  • 1
    I'd be more likely to stick with my current infrastructure (which is way more than just preparedstatements), give up on using PostgreSql to store JSON, and just use a traditional table. But I'd far prefer to find a real solution to what appears to be a valid issue with PostgreSql (unless I'm seriously misunderstanding something about the planner). – jgm Jan 23 '15 at 16:32
  • 1
    To me it smells like a bug, it may make sense to visit the bugs mailing list... Temporarily, you may be able to wrap this statement into a function and call it instead of the statement itself. – András Váczi Jan 23 '15 at 17:12
  • Please provide the CREATE FUNCTION statement for jsonb_exists(). (Without testing) I would speculate that an IMMUTABLE sql function might work. – Erwin Brandstetter Jan 24 '15 at 19:23
  • 1
    jsonb_exists() is part of the PostgreSql distribution, no idea where to find its definition. – jgm Jan 24 '15 at 20:20
  • 2
    @deszo: It's a built-in conceptual limitation of Postgres that index support is bound to operators, not functions. – Erwin Brandstetter Feb 04 '15 at 01:24
  • @ErwinBrandstetter interesting. I can confirm it trying the same with some other operators and their implementing functions, too. – András Váczi Mar 03 '15 at 09:56
  • Dear jgm. Can you add a link and/or version number for the updated postgresql JDBC driver that allows escaping the ? please. I can't find a hint at https://jdbc.postgresql.org/documentation/changelog.html – alfonx May 19 '15 at 19:30
  • It is version 1201 of the driver. Maven repository has it at http://mvnrepository.com/artifact/org.postgresql/postgresql/ – jgm May 19 '15 at 20:15
  • @jgm could you please tell how to escape '?' character in JDBC statements? I'm using postgresql driver 42.2.1 but, for example, doubling the '?' doesn't help for me (I'm using Spring Data JPA queries). – Cepr0 May 22 '18 at 21:30

1 Answers1

4

Supported operators and workaround

The default operator class for GIN indexes on json columns jsonb_ops only supports these operators (per documentaion):

Name        Indexed Data Type   Indexable Operators
...
jsonb_ops   jsonb               ? ?& ?| @>

You can achieve this the other way round: Create a simple IMMUTABLE SQL function using the ? operator, which can be inlined and will use the index just like the operator itself:

CREATE OR REPLACE FUNCTION jb_contains(jsonb, text)
  RETURNS bool AS
'SELECT $1 ? $2' LANGUAGE sql IMMUTABLE;

This works, I tested in Postgres 9.4 ...

Misunderstandings

However, you have been asking the wrong question. There are two basic misconceptions in your question.

  1. The jsonb operator ? cannot be used to search for values. Only for keys or array elements. The manual:

    Description:
    Does the string exist as a top-level key within the JSON value?

    You got the wrong operator, the WHERE condition cannot work: WHERE d->'name' ? 'First'

  2. The expression index you have doesn't makes sense either way

    CREATE INDEX idx_jsonthings_name ON jsonthings USING GIN ((d->'name'));
    • The expression d->'name' returns a jsonb value. You would need d->>'name' to get the value as text.

    • But that would still be pointless. Since the value of the name key is a simple string, a GIN index (while possible) makes no sense to begin with.

Solutions

You don't need the operator ? - so no workaround either.
Here are two ways that would actually work:

  1. Plain GIN index on d and use the "contains" operator @>:

    CREATE INDEX idx_jsonthings_d_gin ON jsonthings USING GIN (d);
    
    SELECT d FROM jsonthings WHERE d @> '{"name":"First"}'
    

    You could even use the more specialized operator class jsonb_path_ops. See:

  2. B-tree expression index on d->>'email' and test with good old =:

    CREATE INDEX idx_jsonthings_d_email ON jsonthings ((d->>'email'));
    
    SELECT d FROM jsonthings WHERE d->>'email' = 'First';
    

The second index would be considerably smaller and the query faster.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600