29

Why did MySQL remove the query cache feature after version 8.0?

Laurenz Albe
  • 51,298
  • 4
  • 39
  • 69
shaoyihe
  • 447
  • 1
  • 4
  • 8
  • 5
    This was the BEST favor anyone could do for your server's reduction in CPU cycles used every minute. – Wilson Hauck Sep 14 '18 at 16:40
  • @WilsonHauck For small databases, yes. And great when mysql returns are more predictable in cpu/time it takes. However, when a count(*) on a large table can take 2-3 hours to complete query cache could cut that time down to a few milliseconds.
    Man in the middle caches are no replacement, they don't know if the data has changed. Given that it was a near fatal decision for huge databases.
    – John Feb 06 '20 at 13:40
  • @john Post text results of SHOW CREATE TABLE (yourtablename); and I will suggest something that will be well under 2-3 hours to return your row count. – Wilson Hauck Feb 06 '20 at 13:45
  • @WilsonHauck I have dozens such tables on many servers, it barely matters what structure you use. Here an example https://pastebin.com/MHVACAny Fill it up with 1 billion rows and try make a quick count using an index without query caching. With query caching it's a matter of milliseconds without it's probably about half an hour to an hour each count. Now update one of the columns at random places 1 billion times with different values and you'll notice its index degrades down to 4-5 hours (linear performance loss with updates). (the second part is a distinct issue of innodb, no fair here:) – John Feb 06 '20 at 15:57
  • @John What are the results and nnnn sec with this query? SELECT COUNT(scs_id) from x_full; ? – Wilson Hauck Feb 06 '20 at 19:07
  • @WilsonHauck it doesn't play a big role if you use * or a field,I tried literally all of those variants. The best performance happens if I delete indexes and recreate them but it's all in the area of 20+ minutes at best case. I've spent literally hundreds of hours in debugging and trying around with mysql since 5.x including Percona variants (using percona 8.0.18 currently)

    The only way to count this below 5 minutes is to spawn 30 threads, separate the data into 30 id brackets, and sum the results up

    – John Feb 06 '20 at 23:42
  • @WilsonHauck my use cases are all unusual, mysql is mostly designed & tested for shops and small web apps. When you work with billions of data rows you start to find unbelievable performance problems, including a row of performance bugs in Innodb itself (and occasional segfaults, but those are rare).
    The removal of query cache went heavy on me, probably 1000 development hours went into various caching mechanisms. since then. Though query cache was horrible as well, unpredictable and often hard to understand what is going on.
    – John Feb 06 '20 at 23:46
  • @john What is result of SELECT @@innodb_parallel_read_threads on your Percona 8.0.18 ? I have been involved in tuning MySQL on servers with more than 700 Connections per second - so we know it can work with billions of rows, when tuned. – Wilson Hauck Feb 07 '20 at 11:35
  • @WilsonHauck It's currently at 8 I can see great performance on mysql when the indexes are fresh and many connections have to be served, it's got good multi threading reaching easily 500+mb/sec IO read throughput primary key reads are doing "ok" with the newest versions, any other access is horrible.

    The problem is that mysql has low single threaded performance, I've best performance when going for 20-30 threads (manually constructing the query)

    Currently it can reach about 100mb/sec read on primary index count, ~20-30 mb on non primary access

    – John Feb 07 '20 at 17:17
  • Please create a new question indicating a 'need for speed' and post on pastebin.com and share the links. A) complete (not edited) my.cnf or my.ini
    From SSH login root, Text results of: B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; AND Optional very helpful information, if available includes - htop OR top for most active apps, ulimit -a for a Linux/Unix list of limits, iostat -xm 5 3 for IOPS by device and core/cpu count, for server workload tuning analysis to provide suggestions.
    – Wilson Hauck Feb 07 '20 at 21:28

3 Answers3

23

There is a detailed blog from the MySQL server team about this, where Matt Lord says:

The query cache has been disabled-by-default since MySQL 5.6 (2013) as it is known to not scale with high-throughput workloads on multi-core machines.

We considered what improvements we could make to query cache versus optimizations that we could make which provide improvements to all workloads.

While these choices themselves are orthogonal, engineering resources are finite. That is to say that we are shifting strategy to invest in improvements that are more generally applicable to all workloads.

danblack
  • 7,719
  • 2
  • 10
  • 27
13

Good riddance !!!

It is a challenge for most database developers to correctly estimate the size of the most common result sets in their applications. Having a large query cache was just a big bandage for that.

There is a bigger reason that foreshadowed the demise of the query cache: Four years ago (June 07, 2014), I answered the post Why query_cache_type is disabled by default start from MySQL 5.6?. In short, the query cache was always inspecting the InnoDB Buffer Pool for changes. You can find this on Pages 209-215 of High Performance MySQL (2nd Edition).

I mentioned this over the years:

RIP Query Cache !!!

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
  • 1
    I agree that query cache was a bandage mostly for counts being so slow on innodb. It made queries and applications be completely unpredictable in response time, you can see milliseconds to hours on the same query depending if it was cached or not.

    However, at least it was a bandage. Now you're left bleeding :)

    – John Feb 06 '20 at 16:02
  • 1
    The QC existed before InnoDB existed. The old story I heard (perhaps around 2002) was that the QC was added to win a benchmark with MyISAM. – Rick James Feb 06 '20 at 16:36
  • @RickJames Hah sounds plausible, though it's cheating Mysql needs one thing more than anything else in my opinion: full multithreading on all types of queries. They seem to work in that direction (a bit here a bit there) bit but it should have been implemented long ago. That also would have been a killer feature to not just win but 'own' many types of benchmarks – John Feb 07 '20 at 17:53
  • @John - MySQL 8.0.17(?) have a few parallel queries. I'm not excited yet. A product called "shard query" has been around for a long time; it demonstrates that parallelism typically gives only half the theoretical capacity. (eg, 8 cores yields only 4x speedup.) Also, keep in mind that most I/O has no parallelism unless you have certain RAID setups. – Rick James Feb 08 '20 at 01:23
  • @John - and... "Long ago" servers had one, maybe two CPU cores. So parallelism was not very useful. Once there were multiple cores, there was a flurry of activity to clean up the Mutexes so that multiple connections could work in parallel. – Rick James Feb 08 '20 at 01:25
  • @John - and... I suspect that most of the competition has already implemented parallelism. So, "too late". But... Most apps have lots of connections, not a single connection with a long running query. That is, there is a tradeoff between more connections and longer-running queries. So, I focus on speeding up queries. – Rick James Feb 08 '20 at 01:28
  • @RickJames Yea, in my case I've (currently) low amount of connections for the real big queries. Usually just one but that should be served quickly. You are right about performance loss, using parallel connections you "should" significantly lose performance as IO sequential read is much faster and CPU time shoots up from overhead.
    Ridiculously I consistently get 25-30 times speedups until CPU limit is reached, with more CPU (currently 36 cores) I should get 250-300 times speedup. Even for locky UPDATES.. Is there a competitive better server that's easy to migrate to from linux mysql ?
    – John Feb 08 '20 at 03:36
6

(I agree with the other Answer, but here is my 2-cents-worth.)

As implemented, ...

  • The QC cannot work with Galera or Group Replication, both of which are getting more traction in the HA arena.

  • When query_cache_size got big, it got less efficient. This is due to inefficiencies in "pruning". (Note: Aurora reimplemented it, and seems to have fixed this issue.)

  • There is an overhead in every SELECT because it does not know whether the QC will come into play. Decades ago, this was estimated at 11%. Until getting rid of the QC, the only workaround was to do both query_cache_size = 0 and query_cache_type = 0 in the config file. (Few people realized both were needed.)

  • In the typical Production server, inserts are happening frequently. Since each insert caused a pruning of all entries for that table(s), the QC was virtually useless for such tables.

  • Perhaps 95% of the hundreds of systems I have reviewed for performance problems are better off without the QC.

Rick James
  • 78,038
  • 5
  • 47
  • 113