0

If I add both then the uniqueness constraints fail, even though my field is in UTF-8. What can be done? I'm running 5.5.16

mysql> show create table foo_person;
+------------+------------------------------------------------+
| Table      | Create Table                                   |
+------------+------------------------------------------------+
| foo_person | CREATE TABLE `foo_person` (
  `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+------------+------------------------------------------------+

mysql> INSERT INTO foo_person (`name`) VALUES ('resumé');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO foo_person (`name`) VALUES ('resume');
ERROR 1062 (23000): Duplicate entry 'resume' for key 'PRIMARY'

mysql> select * from foo_person;
+---------+
| name    |
+---------+
| resumé  |
+---------+
1 row in set (0.00 sec)
Kit Sunde
  • 323
  • 1
  • 3
  • 15
  • First thing would be to reconsider using a text field for the primary key - numeric keys typically offer much better performance, and you wouldn't have the e/é issue – Hannah Vernon Aug 09 '13 at 23:10
  • @MaxVernon He would still have the same issue, if he wanted to apply a Unique constraint on the name column. – ypercubeᵀᴹ Aug 10 '13 at 09:16
  • @ypercube Correct and adding to that, SELECT * FROM foo_person WHERE name='resume' would return 2 results. Which is unwanted since they are two different words. – Kit Sunde Aug 10 '13 at 09:25
  • 1
    @MaxVernon: Is there a logical explanation for integer keys offering better performance over text keys or is it just theory? when an index is created(on primary key column here), i don't think the actual values are used anywhere, index just holds pointers to the rows and so i say there's no difference in performance. I have a table with about 4.7M rows with a primary key on VARCHAR column and it's equally fast and it actually saves me space on additional column for id. – Fr0zenFyr Aug 13 '13 at 08:18
  • @Fr0zenFyr I was curious about that as well, this could be another question. I'd up vote it. – Kit Sunde Aug 13 '13 at 08:43
  • If your varchar index doesn't hold the actual values, how can a lookup be performed against it? Clearly the index contains pointers, but it also contains the exact value of each key, in your case a possibly very long string, and in the case if an integer key, typically 32 bits at most. You tell me which is faster to lookup. – Hannah Vernon Aug 13 '13 at 11:08
  • @MaxVernon: I must say that i partially agree with you there. Considering performance, I think an INT maybe 8 or 10 bytes will hardly give any benefit over VARCHAR(15). Here I save space on a column for surrogate key on id and hardly lose any on the index size. By the way, I say VARCHAR(15) because in my case its patent number. Of course, there will be considerable difference if I use a long VARCHAR, say 100 bytes, that wouldn't make any sense. – Fr0zenFyr Aug 14 '13 at 09:36
  • @MaxVernon: You may want to look at this question for some shocking results on comparisions. Time on VARCHAR FK was always shorter. – Fr0zenFyr Aug 14 '13 at 09:46
  • @KitSunde: you may want to take a look at this SO question and this dba.SO question. These are really interesting threads. – Fr0zenFyr Aug 14 '13 at 09:48
  • @Fr0zenFyr Thanks I did find the SO one, I also tried googling around for some benchmarks, but I couldn't find much more than opinion. If I get bored I'll benchmark it myself I think. :) – Kit Sunde Aug 14 '13 at 15:32

1 Answers1

3

It seems that MySQL does not support Accent Sensitive collations. (See http://dev.mysql.com/doc/refman/5.6/en/charset-collation-implementations.html)

Therefore your only option is to go with a binary collation, e.g. utf8_bin. However, that would also make the column case sensitive, which is different from your current collation.

See also https://stackoverflow.com/questions/500826/how-to-conduct-an-accent-sensitive-search-in-mysql

Sebastian Meine
  • 9,125
  • 1
  • 27
  • 32