0

I am in charge of a mysql database which is about 12 gb, and it grows pretty quickly (was 3gb a few months ago). I wanted to know what measures i can take when the database grows even larger (lets say the 50-100gb area) to keep the performance good. I heard about sharding and archiving but it was pretty overwhelming so i was hoping to get some clarification here. Thank you very much

user69153
  • 157
  • 2
  • 11
  • Are you sure the data stored are not redundant and all needed? That the database size and growth is not only effect of bad design? Not saying it is, just asking you to check it. – jkavalik Sep 09 '15 at 08:43
  • Is it a must that the DB will be managed by you? Can you use cloud based solutions? – Tzach Sep 09 '15 at 09:24
  • http://dba.stackexchange.com/questions/19821/how-to-manage-a-huge-database – Md Haidar Ali Khan Sep 09 '15 at 10:25
  • The design is not amazing but there is a natural growth as more data is inserted. What im looking for is a solution when it grows too big – user69153 Sep 09 '15 at 10:25
  • @user69153, if you have most of your DB table in Innodb engine then you can keep some table to MyISAM, as well as you split your data over several tables to main some good performance. you can also refer this (http://stackoverflow.com/questions/2899161/how-to-handle-large-table-in-mysql) – Md Haidar Ali Khan Sep 09 '15 at 11:00
  • 1
    I would not suggest MyISAM https://jeremystein.com/journal/innodb-versus-myisam-no-comparison/ – jkavalik Sep 09 '15 at 11:09
  • @ ok, jkavalik i am agree with you. Even RolandoMySQLDBA has explain better way about MyISAM and InnoDB engine with every parameter. (http://dba.stackexchange.com/questions/17431/which-is-faster-innodb-or-myisam) – Md Haidar Ali Khan Sep 09 '15 at 11:15

2 Answers2

2

It depends on your table structures, data layout and usage pattern. However I wouldn't consider 12 gb or actually anything that can fit into the RAM of a reasonable priced commodity server big data (~100-400 Gb). I work with MySQL servers happily running with terabytes of data. If your tables are well designed, indexed and your queries are well written you won't see issues.

After that it's usually much easier to split your dataset by common usage pattern and logical dependencies. Move tables to a different database. For example split your log tables to a new server. You can repeat that as long as want until you reach the write limit of a single server and your cannot split your database anymore (all tables are strongly related). Then you can think about sharding.

Károly Nagy
  • 3,020
  • 1
  • 13
  • 13
  • Its a pretty hardcore star schema so its basicly a huge table and many extensions, so it wont be ineffective to split the database? Thank you very much for your help – user69153 Sep 09 '15 at 10:30
  • In that case it's if your main table that is growing significantly then splitting is not an option. You can use MySQL partitioning to keep partition sizes on a reasonable level if you can find a good key to partition on. If you reach physical limits with your main table (space, IOPS or thoughput) you can start sharding it. – Károly Nagy Sep 09 '15 at 11:03
  • Thats the main issue, there is no good key to do it. The system was not meant to be that big but eventually it will be as there are a lot more users than expected. When you would suggest to start considering sharding? – user69153 Sep 09 '15 at 13:31
  • I wouldn't bother until you reach couple of hundred gigabytes or terabyte scale. Sharding would bring a lot of complexity to your infrastructure (backups, schema changes etc.), app (handling querying form multiple databases, merging, sorting it in app) and to the users also. There are promising initiatives to overcome this and provides "transparent sharding" like https://mariadb.com/kb/en/mariadb/spider-storage-engine-overview/ You might want to check that out. – Károly Nagy Sep 09 '15 at 13:51
1

InnoDB, not MyISAM. Subject closed.

Shard? Only if you have a logical way to split the data across machines. For example, are there "users" that are "independent" of each other? That is, can all the info about two users be sitting two separate servers, with no need to JOIN or otherwise combine data between the users? If so, Sharding may be in your future.

The benefit of sharding is to scale (size or writes) beyond what one server can handle. A few GB is not a problem. You did not imply that your queries will melt down a single server.

Partitioning? This does a similar thing, but on a single machine. However, partitioning does not provide any intrinsic benefit. Will you eventually be purging "old" data? If so, the partitioning on 'time' and using DROP PARTITION is a good reason to partition. That and 3 other cases are in my partitioning blog.

If you want a more specific answer, please provide more information about your data and your application. Give us more clues of the type of data, schema, queries to be performed, data flow, etc.

If it is a Data Warehouse app, then Normalization (is that what you meant by 'extensions'?) and Summary Tables are a must.

The sooner you decide to partition or shard, the easier it will be. But if you do it without careful thought, it could be a fiasco. Or, let's say, "a costly learning experience".

Rick James
  • 78,038
  • 5
  • 47
  • 113
  • Hi Rick and thank you very much for your response. It is a data warehouse app with a mysql database with a hardcore star schema (you might be familiar with it, its an elgg database). I have around 90% selects and no logical reason to either partition or shard (the huge table has only id name and description columns). Ive done some research and found out that i can split the selects between slaves and do inserts on master. If that will be possible i think it will solve my problem perfectly (above 90% inserts). Could you advise me what will be the best way to handle my issue? Thank you very much – user69153 Sep 17 '15 at 07:10
  • Since all inserts are done on all servers, Master-Slaves will help a lot for 90% SELECTs. – Rick James Sep 17 '15 at 14:52
  • Raw size is rarely the real problem. Increased insert rate, table scans, increased read rate, etc, tend to be the things that cause a DW app to get in trouble. I think we need to see the schema, sample inserts, and sample selects, before we can carry this discussion further. – Rick James Sep 17 '15 at 14:54
  • Ok thank you very much i'll try to provide it as soon as possible. – user69153 Sep 18 '15 at 10:20