What strategy should I use for my gigant MYSQL Tables?

Question

I'm working in a system that is using MySQL as primary DB. One of the biggest tables is almost 1TB in size, others are around 100-300 GBs.

Queries started to be really slow.

I search some info, and I was planning to use partitioning by ID or by the created timestamp, but I dont really know what or how to do it.

I was thinking something like this (this is my local environment)

ALTER TABLE records
PARTITION BY RANGE (id) (
  PARTITION p0 VALUES LESS THAN (100),
  PARTITION p1 VALUES LESS THAN (200),
  PARTITION p2 VALUES LESS THAN (300),
  PARTITION p3 VALUES LESS THAN MAXVALUE
);

but I get Foreign keys are not yet supported in conjunction with partitioning.

So, I dont really know what to do now. I dont know if droping the FK is a good idea, specially for this kind of systems. Any help?

Also, I thought of migrating to something like PostGres or other engine that could help me of maintaining partitioning easily, but we really need to use Django ORM to query in our system. (Because all aplication is build on top of the ORM).

Thanks for reading. Any help would be appreciated.

This is not about MySQL but it answers your question nevertheless. — mustaccio, May 15 '23 at 20:23
Asking about query tuning you must provide compete CREATE TABLE script and complete text of the query (or queries) to be improved. — Akina, May 16 '23 at 05:02

Rick James · Answer 1 · 2023-05-16T20:16:40.323

0

Re MySQL...

Partition will not help with SELECTs. Especially not by the PRIMARY KEY. Partition by timestamp is useful only if you will be deleting "old" data.

FOREIGN KEYS add some burden to INSERTs. Maybe you should get rid of them?

Queries started to be really slow.

Let's see a really slow query or two. Please also provide SHOW CREATE TABLE.

migrating to ...

The main cost is disk access. Any decent RDBMS will need the same number of disk hits to achieve the apps goals.

For WHERE userid=123 AND date BETWEEN ... AND ..., The 'composite' INDEX(userid, date) does at least as well as partitioning and much better than the single column INDEX(date).

edited May 16 '23 at 20:16

answered May 15 '23 at 23:11

Rick James

78,038
5
47
113

Partition will not help with SELECTs. ?? Partition Pruning may improve SELECT queries, sometimes dramatically. – Akina May 16 '23 at 05:05
1

@Akina So will a good index and no partitioning, by pretty much the same effect (unless skip-scanning is relevant) – Charlieface May 16 '23 at 10:37
@Charlieface Partition pruning is similar to index seek. Main difference - the pruning is performed on the execution plan building step whereas index seek is performed on the query execution step. – Akina May 16 '23 at 10:56
1

So why is partitioning better than indexing? Partitioninng requires higher compilation times, a single index seek on a B+tree is going to be fast – Charlieface May 16 '23 at 10:58
@Akina - Pruning vs an extra level in the BTree -- not much difference. (The exception is when you need a 2-dimensional index and use partitioning for one dimension.) And pruning takes time -- whether in the plan building step or the execution step. – Rick James May 16 '23 at 16:38
1

@Charlieface - I argue against Partitioning in my blog: Partition. Part of my argument is to point out how few use cases actually benefit from partitioning. – Rick James May 16 '23 at 16:42
@Charlieface - I am coming from MySQL; you are coming from SQL Server. I would be interested to hear if SQL Server has some magic that I don't see in MySQL. – Rick James May 16 '23 at 16:45
Hi, @RickJames . After thinking it a bit more, I thought that partitioning by date will be more usefull, because people usually use a date picker to select ranges. I think that'd be better – Danilo Bassi May 16 '23 at 17:29
@DaniloBassi - It depends. For WHERE userid=123 AND date BETWEEN...AND ... this index does at least as well as partitioning: INDEX(userid, date) -- with the index columns in this order. If the WHERE has two "ranges", Partitioning may help with one while an INDEX may help with the other. – Rick James May 16 '23 at 18:19
I already have one index by that date, the thing is that my table started to be really heavy, and we're struggling even with simple queries. People usually make queries that need perform a Join with other table, in that case, partitioning is better, I assume.
edit: BTW, we're exploring options right now.
– Danilo Bassi May 16 '23 at 18:41
1

@DaniloBassi "queries that need perform a Join with other table, in that case, partitioning is better, I assume" - Wrong. Partitioning is a tool for data management not for improving SELECT performance, as everyone else here already mentioned. Indexes already partition the data in a way that's much more efficient than the linear distribution of partitioning. If your queries are slowing down, it's likely because they're poorly written and / or have incorrectly defined indexes. You should be asking questions specific to those slow queries on how to improve them, if you want performance help. – J.D. May 16 '23 at 19:12
SQL Server has the same performance implications as MySQL for partitioning. Compilation time is an issue in both I think, as it's basically a UNIONed view which it's doing dynamic slicing (for elimination) during compilation. Doing it at runtime on a normal index is not any slower, as a B+tree has very high lookup efficiency. The only case it will help querying performance is when doing skip-scanning: it's difficult to do otherwise in some situations. – Charlieface May 16 '23 at 20:24

What strategy should I use for my gigant MYSQL Tables?

1 Answers1