How can I get MySQL to accept both resume and resumé on a primary key?

Question

If I add both then the uniqueness constraints fail, even though my field is in UTF-8. What can be done? I'm running 5.5.16

mysql> show create table foo_person;
+------------+------------------------------------------------+
| Table      | Create Table                                   |
+------------+------------------------------------------------+
| foo_person | CREATE TABLE `foo_person` (
  `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+------------+------------------------------------------------+

mysql> INSERT INTO foo_person (`name`) VALUES ('resumé');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO foo_person (`name`) VALUES ('resume');
ERROR 1062 (23000): Duplicate entry 'resume' for key 'PRIMARY'

mysql> select * from foo_person;
+---------+
| name    |
+---------+
| resumé  |
+---------+
1 row in set (0.00 sec)

First thing would be to reconsider using a text field for the primary key - numeric keys typically offer much better performance, and you wouldn't have the e/é issue — Hannah Vernon, Aug 09 '13 at 23:10
@MaxVernon He would still have the same issue, if he wanted to apply a Unique constraint on the name column. — ypercubeᵀᴹ, Aug 10 '13 at 09:16
@ypercube Correct and adding to that, SELECT * FROM foo_person WHERE name='resume' would return 2 results. Which is unwanted since they are two different words. — Kit Sunde, Aug 10 '13 at 09:25
@MaxVernon: Is there a logical explanation for integer keys offering better performance over text keys or is it just theory? when an index is created(on primary key column here), i don't think the actual values are used anywhere, index just holds pointers to the rows and so i say there's no difference in performance. I have a table with about 4.7M rows with a primary key on VARCHAR column and it's equally fast and it actually saves me space on additional column for id. — Fr0zenFyr, Aug 13 '13 at 08:18
@Fr0zenFyr I was curious about that as well, this could be another question. I'd up vote it. — Kit Sunde, Aug 13 '13 at 08:43
If your varchar index doesn't hold the actual values, how can a lookup be performed against it? Clearly the index contains pointers, but it also contains the exact value of each key, in your case a possibly very long string, and in the case if an integer key, typically 32 bits at most. You tell me which is faster to lookup. — Hannah Vernon, Aug 13 '13 at 11:08
@MaxVernon: I must say that i partially agree with you there. Considering performance, I think an INT maybe 8 or 10 bytes will hardly give any benefit over VARCHAR(15). Here I save space on a column for surrogate key on id and hardly lose any on the index size. By the way, I say VARCHAR(15) because in my case its patent number. Of course, there will be considerable difference if I use a long VARCHAR, say 100 bytes, that wouldn't make any sense. — Fr0zenFyr, Aug 14 '13 at 09:36
@MaxVernon: You may want to look at this question for some shocking results on comparisions. Time on VARCHAR FK was always shorter. — Fr0zenFyr, Aug 14 '13 at 09:46
@KitSunde: you may want to take a look at this SO question and this dba.SO question. These are really interesting threads. — Fr0zenFyr, Aug 14 '13 at 09:48
@Fr0zenFyr Thanks I did find the SO one, I also tried googling around for some benchmarks, but I couldn't find much more than opinion. If I get bored I'll benchmark it myself I think. :) — Kit Sunde, Aug 14 '13 at 15:32

score 3 · Accepted Answer · edited May 23 '17 at 12:40

3

It seems that MySQL does not support Accent Sensitive collations. (See http://dev.mysql.com/doc/refman/5.6/en/charset-collation-implementations.html)

Therefore your only option is to go with a binary collation, e.g. utf8_bin. However, that would also make the column case sensitive, which is different from your current collation.

See also https://stackoverflow.com/questions/500826/how-to-conduct-an-accent-sensitive-search-in-mysql

edited May 23 '17 at 12:40

Community

1

answered Aug 10 '13 at 01:29

Sebastian Meine

9,125
1
27
32

I winder if this can be used to define a collation that meets the requirements (case insensitive and accent sensitive): Adding a UCA Collation to a Unicode Character Set – ypercubeᵀᴹ Aug 10 '13 at 09:29
Thank you that was helpful, the question linked to http://bugs.mysql.com/bug.php?id=19567 which covers this issue. I believe I can (for this particular use case) switch to utf8_bin as everything is lowercased. – Kit Sunde Aug 10 '13 at 09:41

How can I get MySQL to accept both resume and resumé on a primary key?

1 Answers1