I use Amazon RDS as mysql db backend for highload project and try to optimize insert performance for innodb tables.
According to information from different sources - good candidate to speed up inserts is innodb_flush_log_at_trx_commit parameter.
My local tests show significant boost when I set it to 2 (or 0), but changing this param on RDS has no effect.
What may be a problem?
I tried all available values and results are the same (30-40 inserts per second into empty table) while local tests show ~9000 inserts per second.
I also tried M3.large General Purpose SSD RDS test instance (0 load) with same result.
Asked
Active
Viewed 1,114 times
1
Phantom
- 119
- 3
-
1http://dba.stackexchange.com/questions/16968/innodb-flush-settings-on-amazon-rds – jkavalik Jan 06 '16 at 09:58
1 Answers
1
Complain to RDS.
Meanwhile...
Plan A: Design code to gather the rows up and do a multi-row INSERT or LOAD DATA.
Plan B: Put BEGIN and COMMIT around small batches of rows. (Suggest 1 second or 100 rows or 1MB, whichever comes first.)
Rick James
- 78,038
- 5
- 47
- 113
-
Only 100 rows? For a big import we did on a live server the 100k value seemed to give the most performance. Sure the table itself was otherwise inactive, the server quite idle and the rows short.. But 1000x less sounds very conservative. – jkavalik Jan 07 '16 at 06:17
-
Yeah, 100 may be too conservative. However, large numbers lead to slowdowns because of the undo log and/or the replication delays. – Rick James Jan 07 '16 at 06:49
-
Thanks. I usually suggest to test it with 1000 when asked but I would not want to give bad advices. – jkavalik Jan 07 '16 at 07:10
-
100K rows is likely to blow out the undo log and cause replication stalls. 100KB for batch insert size is arguably unnecessarily small (think packet, buffer, etc sizes). – Rick James Jan 07 '16 at 07:15
-
My experiments with batched
INSERTshow that 1000 rows runs at about 99% of the theoretical max speed. Going beyond that is "diminishing returns" (and could lead to degradation, as I mentioned.) My conservative "100" is about 90% of max; I think of it as a "safe" tradeoff. – Rick James Jan 07 '16 at 07:17 -
100K rows was really a one time instance when importing 120GB table (one row was something like 180B), which seemed to work fastest from quick testing (Galera cluster, maybe in that case the optimum is a bit shifted), but yes, I was a bit afraid about the undo logs so we monitored it and hopefully will not have to repeat on a live system again. Thank you for explaining the numbers, I am going to suggest smaller batches in the future. – jkavalik Jan 07 '16 at 07:24