在mysql的备库的监控中有一项很重要的指标:Seconds_Behind_Master,这个值是怎么得到的呢?下面从5.1.58的代码中分析一下:
mysql的replication中有2个比较重要的class:Master_info(rpl_mi.h), Relay_log_info(rpl_rli.h),他们分别对应于master,info文件和slave.info文件;很显然,Master_info是io_thread需要的,Relay_log_info是sql_thread需要的。Master_info中有一个变量 clock_diff_with_master,这个值记录着mysql的主库和备库的时间差,可以理解为主备的主机时间差。clock_diff_with_master变量的定义如下:
- /*
- The difference in seconds between the clock of the master and the clock of
- the slave (second - first). It must be signed as it may be <0 or >0.
- clock_diff_with_master is computed when the I/O thread starts; for this the
- I/O thread does a SELECT UNIX_TIMESTAMP() on the master.
- "how late the slave is compared to the master" is computed like this:
- clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master
- */
- long clock_diff_with_master;
这个变量的注释直接提到了Seconds_Behind_Master的计算方法:clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master。clock_of_slave是slave的当前时间--执行show slave status的当前时间。
先看一下clock_diff_with_master的计算:(slave.cc)。执行”start slave;“/“start slave io_thread;”后,会执行start_slave_threads来启动io thread,io thread启动后首先做的就是获取主库的mysql版本和主库的当前时间(mysql_real_query(mysql, STRING_WITH_LEN("SELECT UNIX_TIMESTAMP()"))),获取到主库的当前时间后,用备库的当前时间减去主库的时间,得到clock_diff_with_master。
- int start_slave_threads(bool need_slave_mutex, bool wait_for_start,
- Master_info* mi, const char* master_info_fname,
- const char* slave_info_fname, int thread_mask)
- {
- ...
- if (thread_mask & SLAVE_IO)
- error=start_slave_thread(handle_slave_io,lock_io,lock_cond_io,
- cond_io,
- &mi->slave_running, &mi->slave_run_id,
- mi, 1); //high priority, to read the most possible
- ...
- }
- ...
- /**
- Slave IO thread entry point.
- @param arg Pointer to Master_info struct that holds information for
- the IO thread.
- @return Always 0.
- */
- pthread_handler_t handle_slave_io(void *arg)
- {
- ...
- thd_proc_info(thd, "Checking master version");
- ret= get_master_version_and_clock(mysql, mi);
- ...
- }
- ...
- static int get_master_version_and_clock(MYSQL* mysql, Master_info* mi)
- {
- ...
- master_res= NULL;
- if (!mysql_real_query(mysql, STRING_WITH_LEN("SELECT UNIX_TIMESTAMP()")) &&
- (master_res= mysql_store_result(mysql)) &&
- (master_row= mysql_fetch_row(master_res)))
- {
- mi->clock_diff_with_master=
- (long) (time((time_t*) 0) - strtoul(master_row[0], 0, 10));
- }
- else if (is_network_error(mysql_errno(mysql)))
- {
- mi->report(WARNING_LEVEL, mysql_errno(mysql),
- "Get master clock failed with error: %s", mysql_error(mysql));
- goto network_err;
- }
- else
- {
- mi->clock_diff_with_master= 0; /* The "most sensible" value */
- sql_print_warning("\"SELECT UNIX_TIMESTAMP()\" failed on master, "
- "do not trust column Seconds_Behind_Master of SHOW "
- "SLAVE STATUS. Error: %s (%d)",
- mysql_error(mysql), mysql_errno(mysql));
- }
- ...
- }
last_timestamp_executed_by_SQL_thread则是sql slave thread最近一次执行的event的时间戳(rpl.rli.cc中的last_event_start_time变量)。而这个变量的更新则是在stmt_done()函数中完成的(rpl.rli.cc),相关定义如下:
- /*
- Used by row-based replication to detect that it should not stop at
- this event, but give it a chance to send more events. The time
- where the last event inside a group started is stored here. If the
- variable is zero, we are not in a group (but may be in a
- transaction).
- */
- time_t last_event_start_time;
- /**
- Helper function to do after statement completion.
- This function is called from an event to complete the group by
- either stepping the group position, if the "statement" is not
- inside a transaction; or increase the event position, if the
- "statement" is inside a transaction.
- @param event_log_pos
- Master log position of the event. The position is recorded in the
- relay log info and used to produce information for <code>SHOW
- SLAVE STATUS</code>.
- @param event_creation_time
- Timestamp for the creation of the event on the master side. The
- time stamp is recorded in the relay log info and used to compute
- the <code>Seconds_behind_master</code> field.
- */
- void stmt_done(my_off_t event_log_pos,
- time_t event_creation_time);
具体的逻辑则是,sql thread启动后,读取relaylog(netxt_event()),apply & update pos(apply_event_and_update_pos()),update_pos的时候判断是否执行到了事务的结束位置,如果执行到了,则调用stmt_done(),stmt_done()会将last_master_timestamp更新为最近一次event的创建时间(event_creation_time)。因此如果在主备基本无延迟的时候,主库执行了一个大事务,你会发现备库延迟突然很大,然后又没了,"延迟跳跃"。
- /**
- Slave SQL thread entry point.
- @param arg Pointer to Relay_log_info object that holds information
- for the SQL thread.
- @return Always 0.
- */
- pthread_handler_t handle_slave_sql(void *arg)
- {
- ...
- if (exec_relay_log_event(thd,rli))
- ...
- }
- ...
- static int exec_relay_log_event(THD* thd, Relay_log_info* rli)
- {
- ...
- Log_event * ev = next_event(rli);
- ...
- exec_res= apply_event_and_update_pos(ev, thd, rli);
- ...
- }
- ...
- int apply_event_and_update_pos(Log_event* ev, THD* thd, Relay_log_info* rli)
- {
- ...
- exec_res= ev->apply_event(rli);
- ...
- int error= ev->update_pos(rli);
- ...
- }
- ....log_event.cc
- int
- Rows_log_event::do_update_pos(Relay_log_info *rli)
- {
- ...
- if (get_flags(STMT_END_F))
- {
- /*
- Indicate that a statement is finished.
- Step the group log position if we are not in a transaction,
- otherwise increase the event log position.
- */
- rli->stmt_done(log_pos, when);
- }
- ...
- }
- ...rpl_rli.cc
- void Relay_log_info::stmt_done(my_off_t event_master_log_pos,
- time_t event_creation_time)
- {
- ...
- if ((sql_thd->options & OPTION_BEGIN) && opt_using_transactions)
- inc_event_relay_log_pos();
- else
- {
- inc_group_relay_log_pos(event_master_log_pos);
- flush_relay_log_info(this);
- /*
- Note that Rotate_log_event::do_apply_event() does not call this
- function, so there is no chance that a fake rotate event resets
- last_master_timestamp. Note that we update without mutex
- (probably ok - except in some very rare cases, only consequence
- is that value may take some time to display in
- Seconds_Behind_Master - not critical).
- */
- #ifndef DBUG_OFF
- if (!(event_creation_time == 0 && debug_not_change_ts_if_art_event > 0))
- #else
- if (event_creation_time != 0)
- #endif
- last_master_timestamp= event_creation_time;
- }
- }
另外一种可能会出现延迟跳跃的情况是:主库的网络有问题,io thread拉binlog的速度很慢,备库实际有很大的延迟,而sql thread执行的时候永远是hot log,这个时候会出现,备库一直有很大的延迟,突然一下延迟没了(=0),接下去延迟又变得很大。这种情况的原因为:sql thread读取relay log是在next_event(slave.cc)中完成的,而next_event中会判断当前的读取的relay log是否是“热的”--正在被io thread使用,如果是"热的",并且读取到当前relay log的最后一个event 或io thread没有再更新relay log的时候,会将last_master_time_stamp置为0,并new一个 Rotate_log_event。而如果在这个时候执行"show slave status",延迟就是0了。这个地方需要补充一点的是“show slave status”和前面提到的逻辑,唯一有争用的锁是&rli->data_lock,但是next_event在恢复last_master_time_stamp前就释放了这个锁:
- ...slave.cc...
- static Log_event* next_event(Relay_log_info* rli)
- {
- ...
- safe_mutex_assert_owner(&rli->data_lock);
- ...
- if (!cur_log->error) /* EOF */
- {
- /*
- On a hot log, EOF means that there are no more updates to
- process and we must block until I/O thread adds some and
- signals us to continue
- */
- if (hot_log)
- {
- time_t save_timestamp= rli->last_master_timestamp;
- rli->last_master_timestamp= 0;
- ...
- /*
- We can, and should release data_lock while we are waiting for
- update. If we do not, show slave status will block
- */
- pthread_mutex_unlock(&rli->data_lock);
- ...
- rli->last_master_timestamp= save_timestamp;
- ...
- }
- ...
- }
- ...
- }
- ....
- /**
- Execute a SHOW SLAVE STATUS statement.
- @param thd Pointer to THD object for the client thread executing the
- statement.
- @param mi Pointer to Master_info object for the IO thread.
- @retval FALSE success
- @retval TRUE failure
- */
- bool show_master_info(THD* thd, Master_info* mi)
- {
- ...
- pthread_mutex_lock(&mi->run_lock);
- protocol->store(mi->io_thd ? mi->io_thd->proc_info : "", &my_charset_bin);
- pthread_mutex_unlock(&mi->run_lock);
- pthread_mutex_lock(&mi->data_lock);
- pthread_mutex_lock(&mi->rli.data_lock);
- pthread_mutex_lock(&mi->err_lock);
- pthread_mutex_lock(&mi->rli.err_lock);
- ...
- /*
- Seconds_Behind_Master: if SQL thread is running and I/O thread is
- connected, we can compute it otherwise show NULL (i.e. unknown).
- */
- if ((mi->slave_running == MYSQL_SLAVE_RUN_CONNECT) &&
- mi->rli.slave_running)
- {
- long time_diff= ((long)(time(0) - mi->rli.last_master_timestamp)
- - mi->clock_diff_with_master);
- /*
- Apparently on some systems time_diff can be <0. Here are possible
- reasons related to MySQL:
- - the master is itself a slave of another master whose time is ahead.
- - somebody used an explicit SET TIMESTAMP on the master.
- Possible reason related to granularity-to-second of time functions
- (nothing to do with MySQL), which can explain a value of -1:
- assume the master's and slave's time are perfectly synchronized, and
- that at slave's connection time, when the master's timestamp is read,
- it is at the very end of second 1, and (a very short time later) when
- the slave's timestamp is read it is at the very beginning of second
- 2. Then the recorded value for master is 1 and the recorded value for
- slave is 2. At SHOW SLAVE STATUS time, assume that the difference
- between timestamp of slave and rli->last_master_timestamp is 0
- (i.e. they are in the same second), then we get 0-(2-1)=-1 as a result.
- This confuses users, so we don't go below 0: hence the max().
- last_master_timestamp == 0 (an "impossible" timestamp 1970) is a
- special marker to say "consider we have caught up".
- */
- protocol->store((longlong)(mi->rli.last_master_timestamp ?
- max(0, time_diff) : 0));
- }
- else
- {
- protocol->store_null();
- }
- ...
- }
到这里Seconds_Behind_Master的计算就结束了。总结起来就是Seconds_Behind_Master的计算有3部分组成,备库当前时间,sql thread最近一次执行的event的时间,主备系统的时间差。备库出现延迟跳跃,可能是大事务引起,也可能是网络问题。(ps:这2种情况,都被我遇到过:))
链接: