- What are the cons of using the below approach?
- What steps can be taken to prevent someone from incorrectly updating the count?
- Is the below approach an anti-pattern? If yes, what better approaches exist?
Assumptions:
Application is a public wiki/forum (like stackexchange)
- The application is read-heavy. Selects will be 99% of the queries made by end-users
- bulk inserts are not performed by end-users. They might only be needed for development/maintenance work
- Concurrent writes to the table are necessary
- Count estimates are good enough. Exact count will be cherry on top
Approach
Consider two entities, Students and Courses each having their corresponding tables.
We also have a table student_courses that stores the Many-to-Many mapping b/w Students and Courses.
create table students (
id bigserial primary key,
name text
);
create table courses (
id bigserial primary key,
content text
);
create table student_courses (
student_id bigint not null references students,
course_id bigint not null references courses,
primary key (student_id, course_id)
);
create index on student_courses (course_id);
To find the number of courses for a given student, we can perform
select count(1)
from student_courses
where student_id = 123;
If these count queries are frequent (say you wish to always display student name with course count), one way to optimize would be to maintain a count variable inside both entities, and then place triggers for insert and delete.
alter table students add column course_count bigint default 0;
alter table courses add column student_count bigint default 0;
create function increment_student_course_count() returns trigger as
$$begin
update students
set course_count := course_count + 1
where student_id = new.student_id;
update courses
set student_count := student_count + 1
where course_id = new.course_id;
return new;
end;$$
create trigger after_insert_update_counts
after insert
on student_courses
for each row execute procedure increment_student_course_count();
create function decrement_student_course_count() returns trigger as
$$begin
update students
set course_count := course_count - 1
where student_id = new.student_id;
update courses
set student_count := student_count - 1
where course_id = new.course_id;
return new;
end;$$
create trigger after_delete_update_counts
after delete
on student_courses
for each row execute procedure decrement_student_course_count();
Once this is in place, finding the course count for a student is simply
select course_count
from students where student_id = 123;
SELECT count(*)is faster thanSELECT count(1)... – Erwin Brandstetter Nov 23 '21 at 17:31Assumptions: Application is a public wiki/forum (like StackExchange).
- The application is read-heavy. Selects will be 99% of the queries made by end-users
- bulk inserts are not performed by end-users. They might only be needed for development/maintenance work
- Concurrent writes to the table are necessary
- Count estimates are good enough. Exact count will be cherry on top
– Pragy Agarwal Nov 23 '21 at 17:39SELECT COUNT(*)faster thanSELECT COUNT(1)in PostgreSQL? I know it makes no difference in some other database systems (Microsoft SQL Server, Oracle, etc) but wondering if there's something different in PostgreSQL? – J.D. Nov 24 '21 at 02:28