2

While aggregating an array, I need to remove empty strings and then combine all adjacent identical values. E.g.:

["","product","product","","product","","product","product","","product","product","","product","","","collection","product","","","product","product","","collection","order","checkout",""]

Should become:

["product","collection","product","collection","order","checkout"]

I have a working query with 4 nested selects:

SELECT array_agg( page_type_unique_pre) FILTER (WHERE page_type_unique_pre != '')
                                        OVER   (ORDER BY event_time) AS page_type_journey_unique

FROM ( SELECT CASE WHEN lag(last_page_type) OVER (ORDER BY event_time) LIKE '%' || page_type || '%' THEN '' ELSE page_type END AS page_type_unique_pre , page_type , event_time FROM ( SELECT string_agg(page_type, ',') OVER (ORDER BY event_time) AS page_type_journey , first_value(page_type) OVER (PARTITION BY last_page_type_partition ORDER BY event_time) AS last_page_type , page_type , event_time FROM ( SELECT sum(CASE WHEN page_type IS NULL OR page_type = '' THEN 0 ELSE 1 END) OVER (ORDER BY event_time) AS last_page_type_partition, page_type, event_time FROM ( SELECT * FROM tes ) a ) b ) c ) d;

See test case in this fiddle.

I'm sure there is a better way to achieve this?

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
AJ AJ
  • 125
  • 4
  • 1
    Your fiddle starts without arrays. Do you have to use arrays, or can you use normal aggregation of your rows? – Charlieface Sep 08 '22 at 16:37
  • I can use without arrays. but I would like to check if it's correct, by seeing the array. – AJ AJ Sep 11 '22 at 04:50

2 Answers2

1

A single subquery should do:

SELECT ARRAY (
   SELECT page_type
   FROM  (
      SELECT event_time, page_type
           , lag(page_type, 1, '') OVER (ORDER BY event_time) AS last_page_type
      FROM   tes
      WHERE  page_type <> ''
      ) sub
   WHERE  page_type <> last_page_type
   ORDER  BY event_time
   )
 AS page_type_journey_unique;

fiddle

Eliminate null and empty string with WHERE page_type <> '' right away. See:

Then get the previous page_type with the window function lag(), placing '' as default. This way last_page_type can never be null (and the empty string '' does not collide with an existing value, after having just been eliminated). See:

Hence, we can use a plain <> (not the more expensive IS DISTINCT FROM) in the outer SELECT to identify rows with a new page type.

Feed the resulting set to an ARRAY constructor. Simplest and cheapest. See:

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
0

As mentioned, it seems you can just use a LAG window function, combined with array aggregation.

If you want to see both before and after combining duplicates, you can use the ARRAY_AGG aggregate with and without a FILTER

SELECT
  ARRAY_AGG(tes.page_type ORDER BY tes.event_time)
      FILTER (WHERE tes.page_type <> tes.last_page_type) AS consecutives_combined,
  ARRAY_AGG(tes.page_type ORDER BY tes.event_time) AS all_values
FROM (
    SELECT
      *,
      LAG(page_type, 1, '') OVER (ORDER BY tes.event_time) AS last_page_type
    FROM tes
    WHERE tes.page_type <> ''
) tes;

db<>fiddle

Charlieface
  • 12,780
  • 13
  • 35