原文地址:http://www.codeweblog.com/mysql-replication-principle/
1, the replication process
Mysql replication (replication) is an asynchronous replication, from a Mysql instace (called Master) to another Mysql instance (call it Slave). Implement the copy operation completed mainly by three processes, two processes in the Slave (Sql process and the IO process), another process in the Master (IO process) on the
To implement replication, you must first open the Master end of the binary log (bin-log) function, or can not be achieved. Because the whole process is actually copied from the Master Slave terminal access to the log and then himself fully in the implementation of the order of log records in the various operations.
Copy the basic process is as follows:
(1) Slave IO above the process connection Master, and requests the specified log file from the specified location (or from the beginning of the log) after the log contents;
(2) Master Slave receives the IO process from request, through the IO is responsible for copying the information, upon request, to read the development process of the log after the log information specified location and return to the Slave of the IO process. Return information in addition to the information contained in the log, but also including this information has been returned to the Master end of the bin-log file name and the bin-log position;
(3) Slave of the IO process of receiving information, will receive the log contents in turn added to the Slave end of the relay-log files of the end, and read to the Master end of the bin-log file name and location of the records to the master- info file so next time can be clearly read the high-speed Master "I need a bin-log of where to begin later in the log, please send it to me";
(4) Slave to Sql process detected a new relay-log increase in the content, it will immediately relay-log parsing the contents of a real implementation of the Master side executable content of those times, and in its implementation.
In fact the old version of Mysql replication to achieve two processes in the Slave side is not done, but by a process to complete. But later discovered there is a big risk to do so and performance issues, notably the following:
First, a process to copy the bin-log to log and parse the log and its implementation process in a serial process, subject to certain restrictions on the performance, asynchronous replication of the delay will be longer.
In addition, Slave-side from the Master to get bin-log over the side, the need to then parse the log, then in its implementation. In this process, Master client may have a lot of changes and has claimed a large number of logs. Master-side at this stage if the storage can not fix the error occurred, then generated at this stage all the changes will never be recovered. If the pressure in the Slave side, when relatively large, this process may take longer.
Therefore, the latter version of Mysql To address this risk and improve the performance of replication, replication of the Slave-side to the two processes to complete. The improved scheme proposed Yahoo! who is an engineer "Jeremy Zawodny". This will not only solve the performance problem, but also shorten the delay time of asynchronous, but also reduces the amount of possible loss of data. Of course, even if two threads into this deal now after the slave data is also still exists the possibility of delay and data loss, after all, the replication is asynchronous. As long as the data changes are not a thing, these problems will always exist. If you want to completely avoid these problems, the cluster can only be used to solve the mysql. However, the cluster is the memory mysql database solution, you need to load all data into memory, so that the memory requirements of very large, and for general applications is not much sex can be implemented
2, copy to achieve level
Mysql replication can be based on a statement (Statement level), it can be based on a record (Row level), configuration parameters can be set in Mysql replication level of the different levels of replication settings affect the Master end of the bin-log records into different forms.
(1) Row Level: log data will be recorded as each line is a modified form, and then again in the slave side to modify the same data.
Benefits: In the row level mode, bin-log records can be executed sql statement without the context of relevant information, it only need to record a record that is modified, what kind of a modified. Therefore, the contents of row level log record will be very clear details of each line of data modification is very easy to understand. And will not appear under certain circumstances a stored procedure or function, and the trigger and trigger a call to the problem can not be copied correctly.
Disadvantages: row level, all statements executed when the records to the log, it also will record each line to record the changes, this may produce a large number of log content, such as there is such an update statement: update product set owner_member_id = 'b' where owner_member_id = 'a', after the implementation of the log is not recorded in this update statement, the amount corresponding to the event (mysql in the form of events to record bin-log log), but this statement for each of the Updated a record of the changes, so a lot of records into the records to be updated many events. Nature, bin-log log volume will be enormous. Especially when the implementation of alter table statements like, when the log volume is staggering. Mysql alter table because the table structure changes like handling statement is a record in the table are required for each change, in fact, is to rebuild the entire table. Then the table will be recorded for each record to the log.
(2) Statement Level: modifies the data in each record to the sql will be the bin-log in the master. slave sql process in the copy when the original master will resolve to the same end sql executed to run again.
Advantages: statement level benefits under the first is to solve the shortcomings of the next row level, the data do not record changes in each row, reducing the amount of bin-log log, save IO, improve performance. Because he only recorded in the Master's statement on the implementation details and the implementation of the context when the information statement.
Disadvantages: Since the implementation of his statement is recorded, so, in order to make these statements in the slave side can be correctly implemented, then he must also record the time each statement in the implementation of some information, that is, context information, to ensure that all statements When the slave side can be implemented and the implementation of the master side when the same results. The other is, You Yu Mysql is relatively fast development, many continue to add new features to make mysql be reproduced encountered no small challenge, when natural reproduction involves the more complex content, bug the easier it appears. In the statement level, the current number of cases have been found in the copy will cause problems with mysql, when the data was modified to use some particular function or functions will occur when, for example: sleep () function in some version can not be true copy, in the stored procedure used last_insert_id () function may cause the slave and the master id and so on are inconsistent. The row level is based on each line to record the change, so no similar problem.
Seen from the official documents, has been before the only Mysql statement based replication mode, until the 5.1.5 version of Mysql started to support the row level replication. From 5.0, Mysql replication has been solved a lot of the old version can not appear in the correct copy of the issue. However, due to the emergence of a stored procedure, a copy of Mysql brings even greater challenges. Also, see the official documents that the 5.1.8 version from the start, Mysql provides a Statement Level and Row Level than a third beyond the copy mode: Mixed, is actually a combination of the first two modes. In Mixed mode, Mysql based on the specific implementation of each sql statement to distinguish between the log form of treatment records, that is, choose one between the Statement and Row. The new version of the Statment level as before, only the implementation of the statement recorded. The new version of Mysql Squadron row level model has also been optimized to do, not all changes will be to record the row level, as encountered when the table structure will change the mode to record the statement, if the sql statement to do is update or delete statements modify data, etc., then the line will still record all the changes.
3, copy common framework
Mysql replication environment is a more than 90% with one or more Master Slave architecture model, mainly used for reading the application of pressure for a low-cost expansion of the database-side solutions. Master and slave as long as the pressure is not too great (especially the slave-side pressure), then asynchronous replication is generally very little delay. Especially since the slave side of the replication processed into two processes, but also reduces the slave side of the delay. The benefits that real-time requirements for the data is not particularly sensitive to the application, just by cheap pc server to expand the number of slave, will read the pressure distributed to multiple slave machines above, you can solve the database side Reading pressure bottleneck. This is in large part to resolve the current pressure on many small and medium site's database bottlenecks, and even some large sites are using similar programs to address the bottlenecks in the database.
Master slave with a number of very simple implementation of the framework, a number of slave and implementation of a single slave, and there is not much difference. In the Master does not care how many client connected to the master side slave, slave as long as the process of certification by the connection to the information he requested binlog, he would connect up the io process in accordance with the requirements, read their own binlog information returned to the slave of the IO process. Configuration details for the slave, the official documents in the Mysql has been said above, very clearly, and even introduced a variety of methods to achieve the configuration of slave.
Mysql does not support the more than subordinate to the Master Slave instance of a structure. That is, a slave instance can only accept a master sync source, I heard a patch that can be improved features, but not practiced. Mysql AB does not realized the reason why such a function, mainly on account of conflict resolution issues.
Mysql can be built into a dual master mode, which means that the two Mysql instance each other's Master, also for the other side of the Slave. But most of this architecture is only one end of the service, to avoid conflicts. Even if both sides of the implementation of changes is in order, due to asynchronous replication mechanism to achieve the same result even if the changes made in the evening may also be covered as early as the changes made, like the following scenario:
Time Mysql A Mysql B
An update is recorded as 10 x table y
2 updates are recorded as 20 x table y
3 A log and access to the application, update the x table y record is 10 (does not meet expectations)
Get updated 4 x table y B log record is 20 (in line with expectations)
Thus, not only in the B library user data is not above the desired results, A and B also appeared on both sides of the data inconsistency. Except under certain conditions can write separately in A and B fixed at both ends to ensure no cross-write, to be able to avoid the above problems