18

I am developing an application to run on the client PC (Win) which is configured with a MySQL server 5.1 instance that will act as read-only slave to the remote master. The remote master has dozens of schemas, but I only need one per client so I supply the replication-do-db setting in my.ini to only replicate the schema that the client needs. The replication works, but when our clients get into regions of the world where internet access is only available via 3G wireless, which charge by data usage, they quickly exceed their data plan limits and run into expensive problems.

As I understand it, MySQL writes all transactions for all schemas into a single binlog file which means that each client has to download all of the transactions that are performed on every schema on the master, then once downloaded, apply the database filter per replication-do-db settings in the client's my.ini file.

To minimize this inefficiency I have employed the slave_compressed_protocol = 1 setting, which seems to reduce the transmitted data by 50%, but still causes our client's to quickly exceed their data limit rack up the 3G bill.

I can't imagine I'm the only one facing this, so I'm sure I'll get a ton of answers on how to achieve this by setting x = y. However, I can't find any documentation of such a setting, nor a recommended approach to take.

So far, here's my thought to a possible solution, please provide feedback or alternate routes:


  1. Set up a "proxy" slave for each schema (on different box, or same box with a different MySQL instance/port)
  2. Configure the proxy slave to replicate-do-db only the one database that the clients wish to replicate.
  3. Configure the client's MySQL instance as slaves to the appropriate proxy slave.

This should result in the client only pulling the binlog data for their schema. The downside (as far as I can tell) is that it dramatically increases the complexity of our setup, likely making it more fragile.

Thoughts? Will this approach even work?

Note, we are running the MySQL 5.0 server on RedHat, but we could upgrade to 5.5 if it produces a solution.

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
Abram
  • 283
  • 2
  • 6

2 Answers2

11

SUGGESTION #1 : Use Distribution Masters

A Distribution Master is a mysql slave with log-bin enabled, log-slave-updates enabled and contains only tables with the BLACKHOLE Storage Engine. You can apply replicate-do-db to the Distribution Master and create binary logs at the Distribution Master that contains only the DB schema(s) you want binlogged. In this way you reduce the size of outgoing binlogs from the Distribution Master.

You can setup a Distribution Master as follows:

  1. mysqldump your database(s) using --no-data option to generate a schema-only dump.
  2. Load the schema-only dump to the Distribution Master.
  3. Convert every table in the Distribution Master to the BLACKHOLE storage engine.
  4. Setup replication to the Distribution Master from a master with real data.
  5. Add replicate-do-db option(s) to /etc/my.cnf of the Distribution Master.

For steps 2 and 3 you could also edit the schema-only dump and replace ENGINE=MyISAM and ENGINE=InnoDB with ENGINE=BLACKHOLE and then load that edited schema-only dump into the Distribution Master.

In step 3 only, if you want to script the conversion of all MyISAM and InnoDB tables to BLACKHOLE in the Distribution Master, run the following query and output it to a text file:

mysql -h... -u... -p... -A --skip-column-names -e"SELECT CONCAT('ALTER TABLE ',table_schema,'.',table_name', ENGINE=BLACKHOLE;') BlackholeConversion FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','mysql') AND engine <> 'BLACKHOLE'" > BlackholeMaker.sql

An added bonus to scripting the conversion of table to the BLACKHOLE storage engine is that MEMORY storage engine tables are converted as well. While MEMORY storage engine table do not take up disk space for data storage, it will take up memory. Converting MEMORY tables to BLACKHOLE will keep memory in the Distribution Master uncluttered.

As long as you do not send any DDL into the Distribution Master, you can transmit any DML (INSERT,UPDATE,DELETE) you so desire before letting clients replicate just the DB info they want.

I already wrote a post in another StackExchange site that discusses using a Distribution Master.

SUGGESTION #2 : Use Smaller Binary Logs and Relay Logs

If you set max_binlog_size to something ridiculously small, then binlogs can be collected and shipped out in smaller chunks. There is also a separate option to set the size of relay logs, max_relay_log_size. If max_relay_log_size = 0, it will default to whatever max_binlog_size is set to.

SUGGESTION #3 : Use Semisynchronous Replication (MySQL 5.5 only)

Setup your main database and multiple Distribution Masters as MySQL 5.5. Enable Semisynchronous Replication so that the main database can quickly ship binlogs to the Distribution Master. If ALL your slaves are Distribution Masters, you may not need Semisynchronous Replication or MySQL 5.5. If any of the slaves, other than Distribution Masters, have real data for reporting, high availability, passive standby or backup purposes, then go with MySQL 5.5 in conjunction with Semisynchronous Replication.

SUGGESTION #4 : Use Statement-Based Binary Logging NOT Row-Based

If an SQL statement updates multiple rows in a table, Statement-Based Binary Logging (SBBL) stores only the SQL statement. The same SQL statement using Row-Based Binary Logging (RBBL) will actual record the row change for each row. This makes it obvious that transmitting SQL statements will save space on binary logs doing SBBL over RBBL.

Another problem is using RBBL in conjunction with replicate-do-db where table name has the database prepended. This cannot be good for a slave, especially for a Distribution Master. Therefore, make sure all DML does not have a a database and a period in front of any table names.

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
  • Interesting ideas @RolandoMySQLDBA, Suggestion 1 sounds like what I was trying to describe with my "proxy" slave setup. However, DDL is something I will need to relate to the slaves. I suppose I can handle this in the app layer, but would rather not if it can be avoided. I can see how suggestion 2 would help if traffic/speed was an issue, but not sure how it would help net bandwidth usage. For suggestion 3, could you elaborate a little for me? I thought semisynchronous would be more for "safe" replication for when you need to know at least 1 slave got the update. Great suggestions BTW! – Abram Jun 01 '11 at 23:07
  • @Abram Please make sure that distribution masters never receive InnoDB or MyISAM tables to limit disk I/O to binlog management !!! – RolandoMySQLDBA Jun 01 '11 at 23:14
  • I'm currently setting up a test environment where I'll have several MySQL 5.5 instances running on the same box (diff port) as distribution masters. Each DM will have a blackhole version of the respective DB from the master. Then I'll set up some remote slaves that I will hang onto the DM. I'll come back with my results.It sounds like the best option, though for some reason I have anxiety of running multiple MySQL instances. Perhaps a job for a micro cloud server from amazon. – Abram Jun 01 '11 at 23:18
  • @RolandoMySQLDBA, regarding the dm never receiving InnoDB or MyISAM, how would I prevent this? – Abram Jun 01 '11 at 23:21
  • 2
    @Abram you should add skip-innodb to /etc/my.cnf. You cannot disable MyISAM since it a stock storage engine. You will have to manually do ALTER TABLE tblname ENGINE=BLACKHOLE if any tables on a distribution master ends up being MyISAM. Maybe create a script from this query : SELECT CONCAT('ALTER TABLE ',table_schema,'.',table_name,' ENGINE=BLACKHOLE;') AlterCommand FROM information_schema.tables WHERE engine='MyISAM' and table_schema NOT IN ('information_schema','mysql'); If you find any, just convert them from the output of this query. – RolandoMySQLDBA Jun 01 '11 at 23:25
  • @RolandoMySQLDBA, gotcha. Thanks for the explanation. We only use InnoDB, so that may be easier than I thought. – Abram Jun 01 '11 at 23:28
  • 1
    As for suggestion # 3, semi synch replication has the master receive acknowledgement from slave that log entry made it to the slave. Under mysql 5.0, master waits until slave is done processing the SQL before sending the same statement to the next slave. Thus, semi synch is faster. – RolandoMySQLDBA Jun 02 '11 at 01:12
  • Thanks again @RolandoMySQLDBA. I did not realize that the master waited in 5.0 - time to upgrade! – Abram Jun 02 '11 at 15:01
  • My testing environment (per suggestion 1) has yielded desired results, and with the scripts you provided it will be a piece of cake to setup new DM instances. I do need to mention that my default install of MySQL 5.1 (installed via YUM on Fedora Core 14) did not include Blackhole engine support, though the MySQL docs suggests it should. – Abram Jun 02 '11 at 21:39
  • @RolandoMySQLDBA,. "Another problem is using RBBL in conjunction with replicate-do-db where table name has the database prepended." Regarding this statement this is not actually a problem with RBR,. in fact it allows much correct usage of replicate-do-db and ignore-do-db,. because as opposed to SBR which ignores or replicates statements based on the default database, RBR actually ignores/replicates statements exactly as they were executed on the master, it honors cases where you append the database name with the table name,.., which is how it should actually be,. – ovais.tariq Jun 03 '11 at 12:10
  • @Abram,. together with setting skip-innodb., you should also set default-table-type=BLACKHOLE., because by default if you define DDL with a table type that is not enable/installed then the default would be to create that table using MyISAM, which you dont want,. – ovais.tariq Jun 03 '11 at 12:17
  • @Abram Stick with my original suggestions. I have tried using BLACKHOLE as a default storage and mysqld just does not like it. Even if mysql accepts it, that's dangerous setup to have, not worth the risk. @tariq "Another problem is using RBBL in conjunction with replicate-do-db where table name has the database prepended." THIS IS A PROBLEM !!!! not with RBR in itself, that works. It is more of a sematics problem no one thought of when making RBR. Since many developers prepend db names to the table name intheir SQL, RBR creates limitations that simply should not exist in the first place. – RolandoMySQLDBA Jun 03 '11 at 12:32
  • @RolandoMySQLDBA, I am finally getting around to disabling innodb per your suggestion (adding skip-innodb to my.cnf) and I am getting an error when restarting mysql: "[ERROR] Unknown/unsupported storage engine: InnoDB". The only database I have on this server has all its tables set to the BLACKHOLE engine. Any ideas? – Abram Jun 09 '11 at 23:55
  • Ok, I figured it out. The default engine is set to Innodb by default so I changed it to MyISAM and now it starts up. Thanks anyway @RolandoMySQLDBA! – Abram Jun 10 '11 at 00:05
2

The max_binlog_size should be irrelevant -- binlog data is streamed out continually.

Caution about a "Distribution Master" -- it is a "single point of failure". That is, if it dies, all the slave(s) beyond it will not be receiving data, and rebuilding the relay will take work.

SBR vs RBR -- it depends on the query. Either can be better or worse than the other.

Put the Distribution Masters on the same machine as the real Master, or on a machine "near" the Master. Use separate ports to keep the instances separate.

Rick James
  • 78,038
  • 5
  • 47
  • 113