13

Preface

Our application runs several threads that execute DELETE queries in parallel. The queries affect isolated data, i.e. there should be no possibility that concurrent DELETE occurs on the same rows from separate threads. However, per documentation MySQL uses so-called 'next-key' lock for DELETE statements, which locks both matching key and some gap. This thing leads to dead-locks and the only solution that we've found is to use READ COMMITTED isolation level.

The Problem

Problem arises when executing complex DELETE statements with JOINs of huge tables. In a particular case we have an table with warnings that has only two rows, but the query needs to drop all warnings that belong to some particular entities from two separate INNER JOINed tables. The query is as follows:

DELETE pw 
FROM proc_warnings pw 
INNER JOIN day_position dp 
   ON dp.transaction_id = pw.transaction_id 
INNER JOIN ivehicle_days vd 
   ON vd.id = dp.ivehicle_day_id 
WHERE vd.ivehicle_id=? AND dp.dirty_data=1

When the day_position table is large enough (in my test case there are 1448 rows) then any transaction even with READ COMMITTED isolation mode blocks entire proc_warnings table.

The issue is always reproduced on this sample data - http://yadi.sk/d/QDuwBtpW1BxB9 both in MySQL 5.1 (checked on 5.1.59) and MySQL 5.5 (checked on MySQL 5.5.24).

EDIT: The linked sample data also contains schema and indexes for the query tables, reproduced here for convenience:

CREATE TABLE  `proc_warnings` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `transaction_id` int(10) unsigned NOT NULL,
    `warning` varchar(2048) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `proc_warnings__transaction` (`transaction_id`)
);

CREATE TABLE  `day_position` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `transaction_id` int(10) unsigned DEFAULT NULL,
    `sort_index` int(11) DEFAULT NULL,
    `ivehicle_day_id` int(10) unsigned DEFAULT NULL,
    `dirty_data` tinyint(4) DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `day_position__trans` (`transaction_id`),
    KEY `day_position__is` (`ivehicle_day_id`,`sort_index`),
    KEY `day_position__id` (`ivehicle_day_id`,`dirty_data`)
) ;

CREATE TABLE  `ivehicle_days` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `d` date DEFAULT NULL,
    `sort_index` int(11) DEFAULT NULL,
    `ivehicle_id` int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `ivehicle_days__is` (`ivehicle_id`,`sort_index`),
    KEY `ivehicle_days__d` (`d`)
);

Queries per transactions are as follows:

  • Transaction 1

    set transaction isolation level read committed;
    set autocommit=0;
    begin;
    DELETE pw 
    FROM proc_warnings pw 
    INNER JOIN day_position dp 
        ON dp.transaction_id = pw.transaction_id 
    INNER JOIN ivehicle_days vd 
        ON vd.id = dp.ivehicle_day_id 
    WHERE vd.ivehicle_id=2 AND dp.dirty_data=1;
    
  • Transaction 2

    set transaction isolation level read committed;
    set autocommit=0;
    begin;
    DELETE pw 
    FROM proc_warnings pw 
    INNER JOIN day_position dp 
        ON dp.transaction_id = pw.transaction_id 
    INNER JOIN ivehicle_days vd 
        ON vd.id = dp.ivehicle_day_id 
    WHERE vd.ivehicle_id=13 AND dp.dirty_data=1;
    

One of them always fails with 'Lock wait timeout exceeded...' error. The information_schema.innodb_trx contains following rows:

| trx_id     | trx_state   | trx_started           | trx_requested_lock_id  | trx_wait_started      | trx_wait | trx_mysql_thread_id | trx_query |
| '1A2973A4' | 'LOCK WAIT' | '2012-12-12 20:03:25' | '1A2973A4:0:3172298:2' | '2012-12-12 20:03:25' | '2'      | '3089'              | 'DELETE pw FROM proc_warnings pw INNER JOIN day_position dp ON dp.transaction_id = pw.transaction_id INNER JOIN ivehicle_days vd ON vd.id = dp.ivehicle_day_id WHERE vd.ivehicle_id=13 AND dp.dirty_data=1' |
| '1A296F67' | 'RUNNING'   | '2012-12-12 19:58:02' | NULL                   | NULL | '7' | '3087' | NULL |

information_schema.innodb_locks

| lock_id                | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
| '1A2973A4:0:3172298:2' | '1A2973A4'  | 'X'       | 'RECORD'  | '`deadlock_test`.`proc_warnings`' | '`PRIMARY`' | '0' | '3172298' | '2' | '53' |
| '1A296F67:0:3172298:2' | '1A296F67'  | 'X'       | 'RECORD'  | '`deadlock_test`.`proc_warnings`' | '`PRIMARY`' | '0' | '3172298' | '2' | '53' |

As I can see both queries wants an exclusive X lock on a row with primary key = 53. However, neither of them must delete rows from proc_warnings table. I just don't understand why the index is locked. Moreover, the index is not locked either when proc_warnings table is empty or the day_position table contains fewer number of rows (i.e. one hundred rows).

Further investigation was to run EXPLAIN over the similar SELECT query. It shows that query optimizer doesn't use index to query proc_warnings table and that's the only reason I can imagine why it blocks the entire primary key index.

Simplified case

Issue also can be reproduced in a simpler case when there are only two tables with couple of records, but the child table doesn't has an index on the parent table ref column.

Create parent table

CREATE TABLE `parent` (
  `id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

Create child table

CREATE TABLE `child` (
  `id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

Fill tables

INSERT INTO `parent` (id) VALUES (1), (2);
INSERT INTO `child` (id, parent_id) VALUES (1, NULL), (2, NULL);

Test in two parallel transactions:

  • Transaction 1

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
    SET AUTOCOMMIT=0;
    BEGIN;
    DELETE c FROM child c 
      INNER JOIN parent p ON p.id = c.parent_id 
    WHERE p.id = 1;
    
  • Transaction 2

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
    SET AUTOCOMMIT=0;
    BEGIN;
    DELETE c FROM child c 
      INNER JOIN parent p ON p.id = c.parent_id 
    WHERE p.id = 2;
    

The common part in both cases is that MySQL doesn't use indices. I believe that's the reason of lock of entire table.

Our Solution

The only solution that we can see for now is increase the default lock wait timeout from 50 seconds to 500 seconds to let the thread finish cleaning up. Then keep fingers crossed.

Any help appreciated.

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
vitalidze
  • 233
  • 2
  • 6
  • I have a question: Did you execute COMMIT in any of the transactions? – RolandoMySQLDBA Dec 13 '12 at 16:47
  • Of course. The problem is that all other transactions must wait until one of them commit changes. The simple test case doesn't contain a commit statement to show how to reproduce the issue. If you run commit or rollback in non-waiting transaction it releases lock simultaneously and waiting transaction completes it's work. – vitalidze Dec 13 '12 at 16:50
  • When you say MySQL does not use indices in either case, is it because there are none in the real scenario? If there are indexes could you provide the code for them? Is it possible to try any of the below posted index suggestions? If there are no indexes, and it's not possible to try adding any, then MySQL can't restrict the data set processed by each thread. If that's the case then N threads would simply multiply the server workload by N times, and it would be more efficient to just let one thread run with a parameter list like {WHERE vd.ivehicle_id IN (2, 13) AND dp.dirty_data=1;}. – JM Hicks Dec 17 '12 at 18:56
  • Ok, found the indexes tucked away in the linked sample data file. – JM Hicks Dec 17 '12 at 19:04
  • couple more questions: 1) how many rows does the day_position table normally contain, when it starts to run so slow that you have to bump the timeout limit to 500 sec? 2) How long does it take to run when you have only the sample data? – JM Hicks Dec 17 '12 at 19:40
  • 1
    Added an answer with a more novel approach. Split the single delete statement into a stored procedure with an isolated read statement and isolated delete statement. The delete statement is dynamically generated using DELETE FROM proc_warnings WHERE IN (...) where the IN clause is the query result from the first statement. Still think that if there's much data, trying indexes that better cover will make a big difference, but this seems like it should keep the threads from blocking for any non-trivial time due to the mutually incompatible exclusive lock requests. – JM Hicks Dec 18 '12 at 00:49
  • in a real cases day_position contains ~300-400K rows, ivehicle_days 50-100K rows. 2) On sample data it runs fast, less than 1 second. 3) About stored procedures - we're not using it now. Thanks for suggestion. Finally, we've ended in a two-stage (or double query) cleaning scheme. First query checks whether there are any of rows matching the WHERE clause with SELECT statement (it doesn't lock rows). Next one is the actual DELETE if check succeeds. This helps.
  • – vitalidze Dec 18 '12 at 09:33
  • Hmmm... "two stages", a "SELECT" and then a "DELETE", SELECT "doesn't lock rows", DELETE "if check succeeds" that's exactly what was in the stored procedure that was posted here about 9 hours before this last comment. If that's the solution you accept for yourself, why not mark the one here, that was posted 9 hours before this revelation for future readers to help them realize this work-around/solution more easily. And if it's someone else's answer you mark you get 2 points to your name :-) Happy New Year. – JM Hicks Jan 05 '13 at 19:09