Can I create a faster version of unnest() over int[]?

Question

I have a column of int[] type. now I want to convert int[] value into rows without using unnest() function. I want to create my own function which takes int[] as input and convert it into rows.

How can I do this?

- line_info table             
    line_id   bigint             
    data  text

- unique_wordhash_mappings table
    wordhash
    line_ids bigint[]

explain(analyze,buffers)
select 
    line_id ,   line_text 
from line_info 
where line_id in (
    select   unnest(line_ids) as line_id 
    from unique_wordhash_mappings 
    where wordhash in (22472126689,24578126689,109225126689,115504126689)
    )

and explain is:

Nested Loop  (cost=41.23..1455.72 rows=1188170 width=135) (actual time=0.886..10.205 rows=2372 loops=1)
  Buffers: shared hit=9511
  ->  HashAggregate  (cost=40.80..42.80 rows=200 width=8) (actual time=0.871..1.298 rows=2372 loops=1)
        Group Key: unnest(unique_wordhash_mappings.line_ids)
        Buffers: shared hit=19
        ->  ProjectSet  (cost=0.42..35.80 rows=400 width=8) (actual time=0.039..0.274 rows=2383 loops=1)
              Buffers: shared hit=19
              ->  Index Scan using unique_wordhash_mappings_word_hash_idx on unique_wordhash_mappings  (cost=0.42..33.77 rows=4 width=32) (actual time=0.033..0.059 rows=4 loops=1)
                    Index Cond: (word_hash = ANY ('{22472126689,24578126689,109225126689,115504126689}'::bigint[]))
                    Buffers: shared hit=16
  ->  Index Scan using line_id_pkey on line_info  (cost=0.43..8.39 rows=1 width=135) (actual time=0.003..0.003 rows=1 loops=2372)
        Index Cond: (line_id = (unnest(unique_wordhash_mappings.line_ids)))
        Buffers: shared hit=9492
Planning time: 0.378 ms
Execution time: 10.401 ms

Not a PostgreSQL expert, but as a guess, you'd probably need a loop which would either populate a temporary table or build a dynamic VALUES constructor or something. But do you think you could elaborate a little on why you want to do this? That information might make your question more useful. — Andriy M, May 21 '18 at 08:38
unnest() is taking time in case of huge records. I am not much aware of unnest() function but it might be using temporary tables internally to store and retrieve values. Basically I want to convert int[] to int and then i have to pass this int inside the IN clause of a query. So o cant directly pass int[] for int . I need to convert first int[] to int and then pass it in IN clause. — user3098231, May 21 '18 at 09:43
So you want to create a cheaper substitute for unnest(). I don't think that's possible. — Erwin Brandstetter, May 21 '18 at 14:40
How fast do you need that query to be? I agree with ypercube that 10 milliseconds seems rather quick. — , May 22 '18 at 14:55
The nested loop could prevent this query to scale for larger tables. You could try a join instead. That changes the meaning of the query and will only work if you never get duplicate unique_wordhash_mappings, but it might be worth a try. Something like: http://dpaste.com/33GXC9G Normalizing the data-model and providing proper indexes could also be beneficial. Another option might be an exists query instead: http://dpaste.com/13TP2WY — , May 22 '18 at 14:59

Evan Carroll · Answer 1 · 2018-05-22T15:20:54.290

3

There are some operations that will magically be faster if you install and include intarray, but unnest isn't one of them. The unnest function is an internal function name array_unnest written in C. It doesn't do much. In order to make unnest faster, you'd have to look at optimizing that out. you can't do less in SQL, which is higher level and compiles down to it anyway.

As pointed out in the comments, you query is 10ms. That's not exactly "slow".

See also,

Why is intarray's push `+` so much faster than array-to-element concatenation `||`?

edited May 22 '18 at 15:20

answered May 22 '18 at 14:47

Evan Carroll

63,051
46
242
479

Query executes faster in second run. In first run it takes around 6 to 13 sec. – user3098231 May 23 '18 at 09:37

score 1 · Answer 2 · answered May 21 '18 at 14:08

unnest() is taking time in case of huge records. I am not much aware of unnest() function but it might be using temporary tables internally to store and retrieve values. Basically I want to convert int[] to int and then i have to pass this int inside the IN clause of a query. So o cant directly pass int[] for int . I need to convert first int[] to int and then pass it in IN clause

I am not sure what kind of your query. Suppose that your query as below

WITH tmp AS 
(
  SELECT array_agg(a)::int[] as arr FROM generate_series(1, 100000) a
)
SELECT a
FROM generate_series(1, 10) a
WHERE a IN (SELECT unnest (arr) FROM tmp);

Hash Semi Join  (cost=15.28..33.47 rows=500 width=4) (actual time=66.792..75.411 rows=10 loops=1)
  Hash Cond: (a.a = (unnest(tmp.arr)))
  CTE tmp
    ->  Aggregate  (cost=12.50..12.51 rows=1 width=32) (actual time=40.285..40.285 rows=1 loops=1)
          ->  Function Scan on generate_series a_1  (cost=0.00..10.00 rows=1000 width=4) (actual time=19.320..28.482 rows=100000 loops=1)
  ->  Function Scan on generate_series a  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.009..0.011 rows=10 loops=1)
  ->  Hash  (cost=1.52..1.52 rows=100 width=4) (actual time=66.689..66.689 rows=100000 loops=1)
        Buckets: 131072 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 3073kB
        ->  CTE Scan on tmp  (cost=0.00..0.52 rows=100 width=4) (actual time=40.694..49.072 rows=100000 loops=1)
Planning time: 0.111 ms
Execution time: 78.435 ms

Without unnest, I think you can use the array operators (Postgres 9.6) as

WITH tmp AS 
(
  SELECT array_agg(a)::int[] as arr FROM generate_series(1, 100000) a
)
SELECT a
FROM generate_series(1, 10) a
WHERE (SELECT arr FROM tmp) @> array[a]::int[]; -- the left side contains the right one

Function Scan on generate_series a  (cost=12.54..25.04 rows=5 width=4) (actual time=34.777..37.465 rows=10 loops=1)
  Filter: ($1 @> ARRAY[a])
  CTE tmp
    ->  Aggregate  (cost=12.50..12.51 rows=1 width=32) (actual time=33.543..33.543 rows=1 loops=1)
          ->  Function Scan on generate_series a_1  (cost=0.00..10.00 rows=1000 width=4) (actual time=12.424..22.094 rows=100000 loops=1)
  InitPlan 2 (returns $1)
    ->  CTE Scan on tmp  (cost=0.00..0.02 rows=1 width=32) (actual time=33.964..33.964 rows=1 loops=1)
Planning time: 0.111 ms
Execution time: 42.226 ms

Above comment is very useful for my knowledge. i have to do something like : "select * from table1 where val1 in (select val2 from table2 where val3=1234);" now table1's val1 column is int type and table2's val2 col is int[] type and its values are comma separated integer array. so when i try to pass above query it gives me error. so i need to convert int[] to int so that i can pass subquery output in main queries IN claue. If there is another way then please guide me. — user3098231, May 22 '18 at 05:05
As @a_horse_with_no_name mentioned above, please update your post to provide your scenario. — Luan Huynh, May 22 '18 at 06:25

Can I create a faster version of unnest() over int[]?

2 Answers2