Why the query optimizer choose this bad execution plan?

Question

We have a MariaDB table (stories) with more than 1TB of data, periodically running a query that fetches recently added rows for indexing somewhere else.

innodb_version: 5.6.36-82.1
version       : 10.1.26-MariaDB-0+deb9u1

The query works just fine when the query optimizer decides on using the secondary index to do a range walk through (in buckets of a 1000):

explain extended     SELECT  stories.item_guid
    FROM  `stories`
    WHERE  (updated_at >= '2018-09-21 15:00:00')
      AND  (updated_at <= '2018-09-22 05:30:00')
    ORDER BY  `stories`.`id` ASC
    LIMIT  1000;
+------+-------------+---------+-------+-----------------------------+-----------------------------+---------+------+--------+----------+---------------------------------------+
| id   | select_type | table   | type  | possible_keys               | key                         | key_len | ref  | rows   | filtered | Extra                                 |
+------+-------------+---------+-------+-----------------------------+-----------------------------+---------+------+--------+----------+---------------------------------------+
|    1 | SIMPLE      | stories | range | index_stories_on_updated_at | index_stories_on_updated_at | 5       | NULL | 192912 |   100.00 | Using index condition; Using filesort |
+------+-------------+---------+-------+-----------------------------+-----------------------------+---------+------+--------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

But occasionally, even with small differences in the data set (Note: the second timestamp difference with the query above, worth to mention that the whole table holds data for several years and holds several dozens millions of rows) decides to use the primary key index:

explain extended     SELECT  stories.item_guid
    FROM  `stories`
    WHERE  (updated_at >= '2018-09-21 15:00:00')
      AND  (updated_at <= '2018-09-22 06:30:00')
    ORDER BY  `stories`.`id` ASC
    LIMIT  1000;
+------+-------------+---------+-------+-----------------------------+---------+---------+------+--------+----------+-------------+
| id   | select_type | table   | type  | possible_keys               | key     | key_len | ref  | rows   | filtered | Extra       |
+------+-------------+---------+-------+-----------------------------+---------+---------+------+--------+----------+-------------+
|    1 | SIMPLE      | stories | index | index_stories_on_updated_at | PRIMARY | 8       | NULL | 240259 |    83.81 | Using where |
+------+-------------+---------+-------+-----------------------------+---------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

Causing it to walk through the whole primary key index (I guess sequentially) to then filter on updated_at field taking several hours to complete instead.

The query was created by the ORM ActiveRecord and is probably far from ideal. As a workaround solution we came up with a couple of manually crafted queries, that moves the ORDER BY stories.id out, and/or using use/force index to avoid the PK since we are really filtering our data set by updated_at.

What I'm interested in understanding here is how/why the query optimizer chooses that execution plan? I understand that query optimizer uses index/table statistics for taking that decision, but in this case, and If my understanding is correct about how innodb works, this is pretty clear that walking through a huge PK while not using any PK id for filtering is not ideal.

I'm essentially trying to understand where that "bad" decision comes from, what statistics or unknown variables are used, so as not to end up with the good plan (the one that is often chosen and which is orders of magnitude faster).

Feel free to correct any of my assumptions as I'm definitively not a DBA expert.

Is changing the query to use a different ORDER BY an option? — ypercubeᵀᴹ, Sep 22 '18 at 18:49
Not really, but as I said you can ORDER BY into a sub query from the resulting data set and/or you can force the secondary index, the question was intended on trying to understand what could be triggering that clearly bad execution plan ... anyway thanks — snebel29, Sep 22 '18 at 22:03
so what is the time difference between these two queries? innodb_stats_traditional was added a while ago to MariaDB when I had a similar problem - large table spuriously choosing bad statistics. ORDER BY with LIMIT query planning doesn't use this information as much as it should in query planning. — danblack, Sep 22 '18 at 23:26
Tip for when you 'walk the second range', take the stories.id from your first query (i.e. add it to the result), and then in the second query add a criteria stories.id > {prev_max}. This is more efficient than OFFSET clauses. This might hint the query back to using the PK, which might be ok if most stories are updated/(created?) at the same time near the same story id. — danblack, Sep 22 '18 at 23:31
@danblack the query difference is huge 0.5 seconds vs 8hours! ?I Will explore those hints, Thanks! — snebel29, Sep 23 '18 at 06:48
Can you provide SHOW CREATE TABLE? Are there 48M rows in the table? And only 0.2M rows are in the range? — Rick James, Oct 10 '18 at 00:19

Rick James · Accepted Answer · 2018-10-10T01:06:21.047

Short Answer: The WHERE and the ORDER BY are tugging in different directions. The Optimizer is not yet smart enough to always correctly decide which direction to be pulled.

Long Answer:

The WHERE benefits from any index starting with updated_at. See the "100%", and other things. Such an index does a quick job of finding the desired rows, all 192K of them. But the query then needs to sort of 192K rows (ORDER BY) before it can get to the LIMIT.

The ORDER BY id benefits from the PRIMARY KEY. This index lets the query get all rows in order (thereby avoiding a sort) and hence can get to the LIMIT (thereby never shoveling around more than 1000 rows.

If the 1000 are found at small value of id, it will run fast. If the desired 1000 are late in the table, the query will run slowly. The Optimizer can't predict (without a lot more smarts and complexity).

Does update_at mostly track id? If so, then either index would lead to essentially the same blocks. But the Optimizer neither notices this, nor does it take advantage of it.

A possible speedup:

A "covering" index is one that has all the needed columns. Since the PK is 'clustered' with the data, PRIMARY KEY(id) is sort of a covering index. But we saw how bad it could be. The following may be better:

INDEX(updated_at,      -- first, to deal with the WHERE
      id, item_guid)   -- in either order

The query would range-scan the 192K rows faster than if it needed to bounce between INDEX(updated_at) and the data 192K times to find item_guid. The sort would not be improved.

Alas, that speeds up the faster query. So, let me put my thinking cap back on.

Partitioning?

Oh, you may have found a 5th use case for PARTITIONing. Six years ago, I had only 4 use cases; I have yet to find a new use case. Let me talk through how your case might be different.

Supposed you used PARTITION BY RANGE (TO_DAYS(updated_at)) on the table, setting up about 50 partitions. (If you need to purge 'old' data, Partitioning is excellent.)

When setting up partitioning, one needs to rethink all the indexes. What indexes do you have now? I'll assume these:

PRIMARY KEY(id),
INDEX(updated_at)

In your case, perhaps not much would change:

PRIMARY KEY(id, updated_at),
INDEX(updated_at)

What would happen to your queries?

Tackling the WHERE first would not change much. Yes, there would be partition pruning, plus the secondary index. Speed would stay about the same.

By tackling the ORDER BY first, the query plan could also prune down to one (or two) partitions. Then the scan in id order within that/those partition(s) would be 50 (or 25) times as fast. There would probably be an added sort, since the rows from separate partitions won't be in order.

Other notes

Could we see SHOW CREATE TABLE? A 1TB table needs a lot of things to help with performance. If all your numbers are INTs, then there might be a lot of space to recover by shrinking to MEDIUMINT, etc. Normalization may be worth doing. If the GUID is indexed, that must be a big burden on insert performance. Etc, etc.

Ypercube's

If ORDER BY updated_at, id is OK for the task, then that eliminates the temptation to use the slow explain plan. And INDEX(updated_at, id, item_guid) becomes the best index. And partitioning becomes useless.

Pagination

Not via OFFSET, remember where you left off. And this discusses how to deal with "left off" involving more than a single column.

Thanks @rick-james for such a comprehensive answer! we finally worked around the issue by forcing the secondary index, we will carefully go over your answer and see what we can do better for the future, thanks! — snebel29, Oct 11 '18 at 09:57

score 1 · Answer 2 · answered Sep 23 '18 at 00:13

1

My suggestion would be to change the ORDER BY id and use instead:

ORDER BY updated_at, id

The secondary index on (updated_at) includes the id as well (which I assume is the primary key), so with this change there is no chance that the index won't be used, as it will be a perfect match for both the WHERE and the ORDER BY clause.

Bonus point, the query's plan will get rid of the sort as well.

answered Sep 23 '18 at 00:13

ypercubeᵀᴹ

97,895
13
214
305

good idea for making the query optimizer to include secondary index without using/forcing indexes explicitly, thanks – snebel29 Sep 23 '18 at 06:51
No, since updated_at is scanned as a "range", id is useless toward the ORDER BY. Only with WHERE updated_at = constant ORDER BY id would `id be used. And, no, the sort is still needed. – Rick James Oct 10 '18 at 00:56
Still changing the ORDER BY to start with updated_at will eliminate the tempation to use the PK. – Rick James Oct 10 '18 at 01:01

score 0 · Answer 3 · answered Sep 22 '18 at 23:18

0

Try setting innodb_stats_traditional=0, this will take more sample pages on larger tables.

Another thing to look at is the ENGINE INDEPENDENT STATISTICS. In 10.2+ ANALYZE TABLE has a lower impact.

answered Sep 22 '18 at 23:18

danblack

7,719
2
10
27

Great stuff, I will try that parameter.. – snebel29 Sep 23 '18 at 06:48
I'm not convinced that the Optimizer has the stats to make the right decision. This is, as opposed to having 'bad' values for the stats. ANALYZE can handle the latter; a major development effort is needed for the former. – Rick James Oct 10 '18 at 00:59

score 0 · Answer 4 · answered Sep 23 '18 at 07:09

Some extra information for whoever can find it useful in the future,

The detail of analyze format=json the GOOD query execution plan

{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 350.13,
    "read_sorted_file": {
      "r_rows": 1000,
      "filesort": {
        "r_loops": 1,
        "r_total_time_ms": 189.42,
        "r_limit": 1000,
        "r_used_priority_queue": true,
        "r_output_rows": 1001,
        "table": {
          "table_name": "stories",
          "access_type": "range",
          "possible_keys": ["index_stories_on_updated_at"],
          "key": "index_stories_on_updated_at",
          "key_length": "5",
          "used_key_parts": ["updated_at"],
          "r_loops": 1,
          "rows": 192912,
          "r_rows": 90729,
          "r_total_time_ms": 336.76,
          "filtered": 100,
          "r_filtered": 1,
          "index_condition": "((stories.updated_at >= '2018-09-21 15:00:00') and (stories.updated_at <= '2018-09-22 05:30:00'))"
        }
      }
    }
  }
}

and the analyze format=json of the BAD one (Note the time around 8 hours!)

    {
      "query_block": {
        "select_id": 1,
        "r_loops": 1,
        "r_total_time_ms": 2.94e7,
        "table": {
          "table_name": "stories",
          "access_type": "index",
          "possible_keys": ["index_stories_on_updated_at"],
          "key": "PRIMARY",
          "key_length": "8",
          "used_key_parts": ["id"],
          "r_loops": 1,
          "rows": 48372467,
          "r_rows": 7.85e7,
          "r_total_time_ms": 2.94e7,
          "filtered": 0.4162,
          "r_filtered": 0.0013,
          "attached_condition": "((stories.updated_at >= '2018-09-21 15:00:00') and (stories.updated_at <= '2018-09-22 06:30:00'))"
        }
      }
}

I've also ran an analyze over the table and primary key index (On my testing slave)

ANALYZE TABLE stories PERSISTENT FOR COLUMNS (id, updated_at) INDEXES (PRIMARY, index_stories_on_updated_at);                                                    
....
....
2 rows in set (7 hours 59 min 18.10 sec)

Which did not help much after almost 8 hours (same bad execution plan), also I've tried setting up to OFF the proposed innodb_stats_traditional with the same result.

Adding updated_at field to the ORDER BY query helped in forcing the execution plan to use secondary index, which thinking about makes more sense than suing/forcing indexes clauses, feels a more clean query and is probably a more compatible statement.

Thanks!

Why the query optimizer choose this bad execution plan?

4 Answers4