2

I need a tool/command which will check a table then say something like that:

  • columnA use tinyint instead of int
  • columnB use enum instead of varchar

I've seen something like this while ago so I do know it exists, but search gives nothing or I'm asking it in a wrong way.

Putnik
  • 285
  • 2
  • 4
  • 14

2 Answers2

3

This is such a quick and dirty question.

This deserves a quick and dirty answer.

Good News : PROCEDURE ANALYSE()

Bad News : It is Deprecated in 5.7.18 and will not be available in MySQL 8.0

I have discussed PROCEDURE ANALYSE() over the years: See my old posts.

In your case, you would just do this:

SELECT columnA,columnB FROM yourtable PROCEDURE ANALYSE();

and the output will tell you the min value, max value, avg value, and recommended datatype.

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
  • Sorry, does not work properly for me (Aurora 5,6): for example I have a table with 76776 records and varchar(255) field. It suggests to use for that field enum of 26812 values! What am I doing wrong? I've checked this your answer https://dba.stackexchange.com/questions/141841/procedure-analyse-suggests-to-turn-timestamp-into-char19/141868#141868 but still lost :( select * from my_table PROCEDURE ANALYSE(1000000,2560000) – Putnik Dec 29 '17 at 22:49
  • If the field is a VARCHAR, then PROCEDURE ANALYSE() treats it as such. Replacing a VARCHAR full of numbers with an ENUM is expected behavior since an ENUM can represent 65535 distinct values. 26812 seems to fit within an ENUM. – RolandoMySQLDBA Dec 29 '17 at 22:57
  • If the VARCHAR column is full of integer values, that column should be converted to INT UNSIGNED. Then, run PROCEDURE ANALYSE() again. – RolandoMySQLDBA Dec 29 '17 at 23:00
  • it is not integers but log lines: text+numbers. Is there better way? – Putnik Dec 29 '17 at 23:26
  • it responds with ENUM to any and all column of every table I have regardless of type and amount of content :( – Putnik Dec 30 '17 at 16:45
1

For huge tables, the main part of performance is I/O. The smaller the datatypes are, the less I/O is needed.

When laying out a schema, think about

  • Use the smallest ...INT that can safely hold the possible values. Most people blindly use the 4-byte INT, even for true/false flags. Some products blindly use the 8-byte BIGINT. BIGINT is not justified 98% of the time.
  • Use UNSIGNED when appropriate (which is most of the time).
  • Do not blindly say VARCHAR(255) for strings.
  • There is a 3-way battle between ENUM, TINYINT UNSIGNED, and VARCHAR; each has advantages and disadvantages. The first two are only 1 byte.
  • Indexes take space, too, and have copies of the columns. But that is another large discussion. Indexes are worth having, but don't blindly "index every column".
  • Use the appropriate CHARACTER SET.
  • Use CHAR only for truly fixed-length strings. Such are almost always CHARACTER SET ascii -- think country_code, postal_code, etc.
  • Never use TINYTEXT. (The other sizes of TEXT are useful.)

It is good to get into the habit of using smaller datatypes, even for non-huge tables.

For tables that I write, I "know" what size the ints are / will be. Without PROCEDURE ANALYSE, I simply say SELECT MIN(a), MAX(a) ... for numbers; SELECT MAX(LENGTH(s)) ... for VARCHAR and TEXT, but I add a fudge factor for future strings.

Rick James
  • 78,038
  • 5
  • 47
  • 113