0

I am running the following DELETE SQL on a MySQL database on a MyISAM/utf8 table. The SQL is called in a loop, once time per Users (in this case):-

DELETE FROM regulatory WHERE documentid  IN
( 
    SELECT * FROM
    (
        SELECT t1.`DocumentID` FROM regulatory T1, regulatory T2
        WHERE (t1.`group_ID`=v_user_id AND t2.`group_ID`=v_user_id AND
        TRIM(t1.`title`)=TRIM(t2.`title`) AND  TRIM(t1.`spare1`='red.gif')
        AND (t1.`docref`=v_temp) AND t1.`DocumentID`>t2.`DocumentID`) 
    ) DocumentID
) ;

But it is taking over a minute per User - how can I optimize this at all so it takes far less time each user?

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
user2462433
  • 33
  • 1
  • 7

3 Answers3

1

Try removing the nesting levels, mysql can not delete from a table that's referenced as a child, instead there's a propriety trick like:

    DELETE t1.* 
    FROM regulatory T1
    JOIN regulatory T2 
        ON t1.group_ID=v_user_id  
       AND t2.group_ID=v_user_id 
       AND TRIM(t1.title)=TRIM(t2.title) 
       AND TRIM(t1.spare1='red.gif') 
       AND t1.docref=v_temp
       AND t1.DocumentID>t2.DocumentID

Also, like @Mihal suggested try removing all function calls like trim from predicates, they are expensive.

Lennart - Slava Ukraini
  • 23,240
  • 3
  • 32
  • 69
  • Brilliant thank you - i will try properly in my loop later! As per the other comment added, would also removing the TRIMs help too? – user2462433 Jul 11 '14 at 09:00
  • It will help, how much depends on how indexes are defined. I think mysql contrary to what the standard says, trims trailing whitespace. But make sure that your insert process removes the whitespace instead of doing it in the queries. Regarding your loop over users, check if you can insert users into a temp table and join with that instead of looping. It will likely be more efficient as you will only query once. – Lennart - Slava Ukraini Jul 11 '14 at 14:06
1

It may be due to unnecessary full materialization of the self-join.

The only reason you are using T2 is to check that there is a lower DocumentID than the one you've gotten from T1. But if there are 100 such, the join will have 100 rows for the same v_user_id/T1.DocumentID. That's not necessary. Try to rewrite as

... AND EXISTS ( SELECT * FROM regulatory T2 WHERE ...) 

EDIT

That is, from the innermost SELECT,

  • remove everything that references T2 :

SELECT t1.DocumentID FROM regulatory T1, regulatory T2
WHERE (t1.group_ID=v_user_id AND t2.group_ID=v_user_id AND TRIM(t1.title)=TRIM(t2.title)
AND TRIM(t1.spare1='red.gif') AND (t1.docref=v_temp) AND t1.DocumentID>t2.DocumentID)

  • and add the existential predicate :

AND EXISTS ( SELECT * FROM regulatory T2 WHERE ...)

  • replacing those ellipsis with all the WHERE clauses that were removed just prior :

... WHERE T2.group_ID=v_user_id AND TRIM(T2.title) = TRIM(T1.title) and T2.DocumentID < T1. ...

The main restricting predicate in the WHERE clauses here are on GROUP_ID and DocumentID. Check that there are indexes on these columns.

Erwin Smout
  • 1,746
  • 10
  • 12
  • oh ok - thanks for that - i am just trying to visualise how the full query would look/read using the EXISTS modification – user2462433 Jul 11 '14 at 11:50
  • I have changed the SQL using you suggestion above so i now have:-

    DELETE FROM regulatory WHERE documentid IN ( SELECT t1.DocumentID FROM regulatory T1 WHERE (t1.group_ID=2898 AND t1.spare1='red.gif' AND t1.docref=1642) AND EXISTS (SELECT * FROM regulatory T2 WHERE T2.group_ID=2898 AND T2.title = T1.title AND T2.DocumentID < T1.DocumentID) );

    But when i test it i get:- Error 1093: You can't specify table 'regulatory' for update in FROM clause (?)

    – user2462433 Jul 13 '14 at 14:35
  • Aaaaargh. http://stackoverflow.com/questions/45494/mysql-error-1093-cant-specify-target-table-for-update-in-from-clause Apply solutions suggested over there or switch to a better dbms. – Erwin Smout Jul 14 '14 at 06:46
1

You are using a selfie-selfie (SELF JOIN within a SELF JOIN). Queries that do self joins in the middle of UPDATEs and DELETEs can have unpredictable results because of two possibilities:

  • rows not to be deleted when it should have been deleted
  • rows deleted that were not meant to be deleted

I wrote about this quirky behavior 3 years ago : Problem with MySQL subquery

You should decompose the DELETE into separate steps

CREATE TABLE DocsToDelete
    SELECT DocumentID FROM
    (
        SELECT t1.`DocumentID` FROM regulatory T1, regulatory T2
        WHERE (t1.`group_ID`=v_user_id AND t2.`group_ID`=v_user_id AND
        TRIM(t1.`title`)=TRIM(t2.`title`) AND  TRIM(t1.`spare1`='red.gif')
        AND (t1.`docref`=v_temp) AND t1.`DocumentID`>t2.`DocumentID`) 
    ) DocumentIDSubquery;
DELETE B.* FROM DocsToDelete A LEFT JOIN regulatory B USING (DocumentID);
DROP TABLE DocsToDelete;

Give it as Try !!!

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
  • 1
    Thanks - i tried it and delete time per employee rapidly reduced to between 10 and 18 seconds a user (of which there are a lot) – user2462433 Jul 15 '14 at 07:01