8

I have an INNODB table levels:

+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id                 | int(9)       | NO   | PRI | NULL    |       |
| level_name         | varchar(20)  | NO   |     | NULL    |       |
| user_id            | int(10)      | NO   |     | NULL    |       |
| user_name          | varchar(45)  | NO   |     | NULL    |       |
| rating             | decimal(5,4) | NO   |     | 0.0000  |       |
| votes              | int(5)       | NO   |     | 0       |       |
| plays              | int(5)       | NO   |     | 0       |       |
| date_published     | date         | NO   | MUL | NULL    |       |
| user_comment       | varchar(255) | NO   |     | NULL    |       |
| playable_character | int(2)       | NO   |     | 1       |       |
| is_featured        | tinyint(1)   | NO   | MUL | 0       |       |
+--------------------+--------------+------+-----+---------+-------+

There are ~4 million rows. Because of the front-end functionality, I need to query this table with a variety of filters and sorts. They are on playable_character, rating, plays, and date_published. The date_published can be filtered to show by the last day, week, month, or anytime(last 3 years). There's also paging. So, depending on the user choices, the queries can look, for example, like one of these:

SELECT * FROM levels
WHERE playable_character = 0 AND
    date_published BETWEEN date_sub(now(), INTERVAL 3 YEAR) AND now()
ORDER BY date_published DESC
LIMIT 0, 1000;

SELECT * FROM levels
WHERE playable_character = 4 AND
    date_published BETWEEN date_sub(now(), INTERVAL 1 WEEK) AND now()
ORDER BY rating DESC
LIMIT 4000, 1000;

SELECT * FROM levels
WHERE playable_character = 5 AND
    date_published BETWEEN date_sub(now(), INTERVAL 1 MONTH) AND now()
ORDER BY plays DESC
LIMIT 1000, 1000;

I should add that rating and plays are always queried as DESC. Only date_published may be either DESC or ASC.

I started out with an index idx_date_char(date_published, playable_character) that worked great on the first example query here. Based on some other answers, I changed to two other indexes (date_published, playable_character, plays) and (date_published, playable_character, rating).

The first query still runs very fast, however there's some unusual things happening in EXPLAIN, when player_character = x exceeds a certain number of rows (~700,000): the USING WHERE pops on in EXPLAIN.

So, first question is are there any improvements in the query or indexes possible, and, second, what MySQL settings should get altered to allow for the large result sets.

Any suggestions greatly appreciated. TIA.

Hal50000
  • 269
  • 1
  • 6
  • I sure would like to see the "explains" and "show indexes from levels" output when the unusual things happen.. looks like the deep scanning (larger start value in LIMIT) triggers the use off an temporary table and using filesort. – Raymond Nijland Oct 18 '13 at 17:39
  • Also how slow run other queries? How much time is needed to get the results? – Mindaugas Riauba Oct 18 '13 at 17:57
  • 1
    Isn't this a duplicate of this question?: http://stackoverflow.com/questions/19385610/optimizing-query-in-mysql5-6 – ypercubeᵀᴹ Oct 18 '13 at 17:59
  • @ypercube sure looks like an duplicate i thought i had an Déjà vu but i could not find the topic on stackoverflow annymore – Raymond Nijland Oct 18 '13 at 18:11
  • @Hal50000 If your explain shows "using filesort" i think this blogs.oracle.com/realneel/entry/… could be your problem note that article is old so most likly the filesort will be hinted now with explain hints to get an estimate_rows_upper_bound() see sqlfiddle.com/#!2/4f56a/2, i think filesort gets 30 as rows there.. i also think i need to track down the estimate_rows_upper_bound() in the source code to get an better explainment this is thoery. – Raymond Nijland Oct 19 '13 at 09:48
  • @ypercube You're right. Sorry for gumming up with the duplicate question - I wasn't sure if cross-posting was frowned upon. To your second point, I mistyped the order in my repost here. I used the indexes in the accepted answer. – Hal50000 Oct 22 '13 at 17:36
  • No worries. If this is a "follow-up" question, because you are not satisfied with the efficiency of the answers you got in that question, it's not a problem to make a second or or a cross-post question. But make sure to state that in the question. – ypercubeᵀᴹ Oct 22 '13 at 17:50
  • Now, for the issue, have you tried to play with the configuration, like Bill Karwin's suggestion there? – ypercubeᵀᴹ Oct 22 '13 at 17:50
  • And also check this similar question (of mine) with some great answers: Can spatial index help a “range - order by - limit” query? It's for Postgres but I'm sure the answers can be converted for MySQL. – ypercubeᵀᴹ Oct 22 '13 at 17:51
  • Have a look at this. It might help: http://www.infoq.com/presentations/Index-SQL – Wodin Dec 08 '15 at 13:51

3 Answers3

2
WHERE playable_character = 0 AND
    date_published BETWEEN date_sub(now(), INTERVAL 3 YEAR) AND now()

Start with the "=" item, then do the range:

INDEX(playable_character, date_published);

"Pagination", a la ORDER BY rating DESC LIMIT 4000, 1000; is best done by remember where you "left off". That way, you don't have scan over the 4000 records that you don't need.

András Váczi
  • 31,278
  • 13
  • 101
  • 147
Rick James
  • 78,038
  • 5
  • 47
  • 113
-1

create both indexes for better performance if not much of insert or update query:

ADD INDEX (playable_character, date_published, rating desc)

ADD INDEX (playable_character, date_published, plays desc)

otherwise if insert and update query are more using on this table than create index:

ADD INDEX (playable_character, date_published)

or don't create any index

András Váczi
  • 31,278
  • 13
  • 101
  • 147
Saddam Khan
  • 612
  • 1
  • 7
  • 25
-2

Everybody here are missing one point - SQLs in question are retrieving 1000 rows. And one cannot retrieve 1000 rows fast unless most of the data is cached. If data is not cached one has to do 1000 random reads sequentially to retrieve the data.

Good disk based storage with fast disks will give you up to ~200 reads per second. Common disks even in RAID I doubt that manage even 100. That means 10+ seconds to get results even with the best indexes.

So in the longer run such data model and queries won't work. Now the queries run fast when you hit cached data.

Mindaugas Riauba
  • 1,767
  • 1
  • 11
  • 11
  • Even with covering indexes? – ypercubeᵀᴹ Oct 18 '13 at 19:23
  • That's one of the possible ideas how to reorganize the data. – Mindaugas Riauba Oct 18 '13 at 19:26
  • Covering indexes also give best results when they are cached in the buffer pool. Otherwise, you have to expect to wait for disk I/O. – Bill Karwin Nov 17 '13 at 21:46
  • And for what it's worth, SSD-based systems with thousands of IOPS are becoming mainstream for RDBMS servers. – Bill Karwin Nov 17 '13 at 21:47
  • Reading 1,000 rows is very unlikely to be 1,000 IO operations. I'm no expert on InnoDB's internal structures but as it is based on some variation of b-trees it can't be unrecognisably different to MS SQL Server or similar. It apparently uses 16Kb pages, so with those column sizes that is between ~40 and ~400 rows per page for the base data so assuming an index that covers everything and little fragmentation that is 25 pages or less, and depending on allocation method they might be sequential (or close to) on disk rendering random-IO-per-second figures less meaningful. – David Spillett Jan 20 '16 at 16:26
  • It could be as little as just three data pages, plus a one or two internal index (branch) pages that need to be read to locate those data (leaf) pages. – David Spillett Jan 20 '16 at 16:35
  • InnoDB is essentially index organized table. Rows in InnoDB are organized (sorted) by primary key. But queries are not by primary key. Of course 1 row per I/O is the worst possible case. – Mindaugas Riauba Jan 20 '16 at 17:22
  • Even worse is having to do a table scan with the table too big to be cached. – Rick James Oct 25 '16 at 17:44
  • 1
    There are many flavors of good/bad indexing -- from a covering index to a table scan. In between are optimal non-covering index, sub-optimal index, range-scan, unnecessary joins, etc. Fetching 1000 rows may take zero I/O (good index and good caching), all the way to more than 1000 disk hits (huge table, no useful index). – Rick James Oct 25 '16 at 17:48
  • There are so many variations, that it is best to take specific examples one at a time. – Rick James Oct 25 '16 at 17:48