Disclaimer: I am relatively new to PostgreSQL.
I'm wondering how to optimize a query that does 2 INNER JOINs. My scenario is fairly simple:
Select Posts with a photo (Posts.photo IS NOT NULL) and a Hashtag that has the name 'dead' (Hashtags.name = 'dead').
The associations are as follows:
Posts <- PostHashtags -> Hashtags
Posts.id = PostHashtags.postId (FK)
Hashtags.id = PostHashtags.hashtagId (FK)
Here is the query:
SELECT
"Posts".*,
"hashtags"."id" AS "hashtags.id",
"hashtags"."count" AS "hashtags.count",
"hashtags"."name" AS "hashtags.name",
"hashtags"."createdAt" AS "hashtags.createdAt",
"hashtags"."updatedAt" AS "hashtags.updatedAt",
"hashtags"."objectId" AS "hashtags.objectId",
"hashtags"."_etl" AS "hashtags._etl",
"hashtags.PostHashtag"."id" AS "hashtags.PostHashtag.id",
"hashtags.PostHashtag"."createdAt" AS "hashtags.PostHashtag.createdAt",
"hashtags.PostHashtag"."updatedAt" AS "hashtags.PostHashtag.updatedAt",
"hashtags.PostHashtag"."postId" AS "hashtags.PostHashtag.postId",
"hashtags.PostHashtag"."hashtagId" AS "hashtags.PostHashtag.hashtagId",
"hashtags.PostHashtag"."objectId" AS "hashtags.PostHashtag.objectId",
"hashtags.PostHashtag"."_etl" AS "hashtags.PostHashtag._etl"
FROM (
SELECT
"Posts"."id",
"Posts"."note",
"Posts"."photo",
"Posts"."createdAt",
"user"."id" AS "user.id",
"user"."name" AS "user.name"
FROM "Posts" AS "Posts"
INNER JOIN "Users" AS "user" ON "Posts"."userId" = "user"."id"
WHERE "Posts"."photo" IS NOT NULL
AND (
SELECT "PostHashtags"."id" FROM "PostHashtags" AS "PostHashtags"
INNER JOIN "Hashtags" AS "Hashtag" ON "PostHashtags"."hashtagId" = "Hashtag"."id"
WHERE "Posts"."id" = "PostHashtags"."postId"
LIMIT 1
) IS NOT NULL
ORDER BY "Posts"."createdAt" DESC LIMIT 10
) AS "Posts"
INNER JOIN (
"PostHashtags" AS "hashtags.PostHashtag"
INNER JOIN "Hashtags" AS "hashtags" ON "hashtags"."id" = "hashtags.PostHashtag"."hashtagId"
)
ON "Posts"."id" = "hashtags.PostHashtag"."postId"
AND "hashtags"."name" = 'dead'
ORDER BY "Posts"."createdAt" DESC;
EXPLAIN results:
Nested Loop (cost=886222912.89..886223769.55 rows=1 width=277)
Join Filter: ("hashtags.PostHashtag"."postId" = "Posts".id)
-> Limit (cost=886220835.39..886220835.42 rows=10 width=189)
-> Sort (cost=886220835.39..886220988.88 rows=61394 width=189)
Sort Key: "Posts"."createdAt"
-> Nested Loop (cost=0.42..886219508.69 rows=61394 width=189)
-> Seq Scan on "Posts" (cost=0.00..885867917.51 rows=78196 width=177)
Filter: ((photo IS NOT NULL) AND ((SubPlan 1) IS NOT NULL))
SubPlan 1
-> Limit (cost=0.42..815.70 rows=1 width=4)
-> Nested Loop (cost=0.42..815.70 rows=1 width=4)
-> Seq Scan on "PostHashtags" (cost=0.00..811.25 rows=1 width=8)
Filter: ("Posts".id = "postId")
-> Index Only Scan using "Hashtags_pkey" on "Hashtags" "Hashtag" (cost=0.42..4.44 rows=1 width=4)
Index Cond: (id = "PostHashtags"."hashtagId")
-> Index Scan using "Users_pkey" on "Users" "user" (cost=0.42..4.49 rows=1 width=16)
Index Cond: (id = "Posts"."userId")
-> Materialize (cost=2077.50..2933.89 rows=1 width=88)
-> Hash Join (cost=2077.50..2933.89 rows=1 width=88)
Hash Cond: ("hashtags.PostHashtag"."hashtagId" = hashtags.id)
-> Seq Scan on "PostHashtags" "hashtags.PostHashtag" (cost=0.00..721.00 rows=36100 width=40)
-> Hash (cost=2077.49..2077.49 rows=1 width=48)
-> Seq Scan on "Hashtags" hashtags (cost=0.00..2077.49 rows=1 width=48)
Filter: ((name)::text = 'dead'::text)
This query has been simplified slightly. It also performs OUTER JOINS on other data related to Posts, which is why the SELECT must be performed on Posts instead of, say, PostHashtags.
Any help in translating the EXPLAIN to a useful index would be greatly appreciated.
My ideas:
- Build an index on
Posts.photo, but should it be a partial indexWHERE "photo" IS NOT NULL? - Build a
UNIQUEindex onHashtags.name.
I'm not sure if those are necessarily the bottlenecks, though.
EXPLAIN (BUFFERS, ANALYZE)would be more helpful. Plus your version of Postgres. What do you get for:SELECT count(*) AS ct, count(photo) AS pct FROM "Posts"? Most importantly: Your description does not match what the query does exactly: from the latest 10 posts with photo and any tags, display only those with the tag 'dead'. Is that intented or isLIMIT 10in the subquery a debugging residue? Changes the nature of the query. – Erwin Brandstetter Jan 16 '15 at 00:47