Summary: I have a simple database schema but even with just a few 10's of thousands of records the performance on basic queries is already becoming a problem.
Database: PostgreSQL 9.6
Simplified schema:
CREATE TABLE article (
id bigint PRIMARY KEY,
title text NOT NULL,
score int NOT NULL
);
CREATE TABLE tag (
id bigint PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE article_tag (
article_id bigint NOT NULL REFERENCES article (id),
tag_id bigint NOT NULL REFERENCES tag (id),
PRIMARY KEY (article_id, tag_id)
);
CREATE INDEX ON article (score);
Production data info:
All tables are read/write. Low write volume, only a new record every couple minutes or so.
Approximate record counts:
- ~66K articles
- ~63K tags
- ~147K article_tags
Average of 5 tags per article.
Question: I want to create a view article_tags which includes an array of tags for every article record, can be ordered by article.score and paginated with or without additional filtering.
In my first attempt I was surprised to see that the query took ~350 ms to execute and wasn't using the indexes. In subsequent attempts I was able to get it down to ~5 ms but I don't understand what is going on. I would expect all these queries to take the same amount of time. What crucial concept am I missing here?
Attempts (SQL Fiddles):
- multi-table joins (~350 ms), (~5 ms if ordered by article.id!) -- seemed like the most natural solution
- subquery join (~300 ms) -- also seemed like a natural solution
- limited subquery join (~5 ms) -- super awkward, can't be used for view
- lateral join (~5 ms) -- is this really what I should be using? seems like a misuse of lateral
- ...something else?
article(paginated), or a few selected rows - filtered how exactly? – Erwin Brandstetter Apr 29 '18 at 12:31