MySQL :: MySQL 8.0 Reference Manual :: 17.4.2 Handling an Unexpected Halt of a Replica

version 8.0

5.7
5.6

8.0 Japanese

MySQL 8.0 Reference Manual/。../ Handling an Unexpected Halt of a Replica

17.4.2 Handling an Unexpected Halt of a Replica

In order for replication to be resilient to unexpected halts of the server (sometimes described as crash-safe) it must be possible for the replica to recover its state before halting. This section describes the impact of an unexpected halt of a replica during replication, and how to configure a replica for the best chance of recovery to continue replication.

After an unexpected halt of a replica, upon restart the replication SQL thread must recover information about which transactions have been executed already. The information required for recovery is stored in the replica's applier metadata repository. From MySQL 8.0, this repository is created by default as anInnoDBtable namedmysql.slave_relay_log_info。通过使用这种事务性存储引擎nformation is always recoverable upon restart. Updates to the applier metadata repository are committed together with the transactions, meaning that the replica's progress information recorded in that repository is always consistent with what has been applied to the database, even in the event of an unexpected server halt. For more information on the applier metadata repository, seeSection 17.2.4, “Relay Log and Replication Metadata Repositories”。

DML transactions and also atomic DDL update the replication positions in the replica's applier metadata repository in themysql.slave_relay_log_infotable together with applying the changes to the database, as an atomic operation. In all other cases, including DDL statements that are not fully atomic, and exempted storage engines that do not support atomic DDL, themysql.slave_relay_log_infotable might be missing updates associated with replicated data if the server halts unexpectedly. Restoring updates in this case is a manual process. For details on atomic DDL support in MySQL 8.0, and the resulting behavior for the replication of certain statements, seeSection 13.1.1, “Atomic Data Definition Statement Support”。

The recovery process by which a replica recovers from an unexpected halt varies depending on the configuration of the replica. The details of the recovery process are influenced by the chosen method of replication, whether the replica is single-threaded or multithreaded, and the setting of relevant system variables. The overall aim of the recovery process is to identify what transactions had already been applied on the replica's database before the unexpected halt occurred, and retrieve and apply the transactions that the replica missed following the unexpected halt.

For GTID-based replication, the recovery process needs the GTIDs of the transactions that were already received or committed by the replica. The missing transactions can be retrieved from the source using GTID auto-positioning, which automatically compares the source's transactions to the replica's transactions and identifies the missing transactions.
For file position based replication, the recovery process needs an accurate replication SQL thread (applier) position showing the last transaction that was applied on the replica. Based on that position, the replication I/O thread (receiver) retrieves from the source's binary log all of the transactions that should be applied on the replica from that point on.

Using GTID-based replication makes it easiest to configure replication to be resilient to unexpected halts. GTID auto-positioning means the replica can reliably identify and retrieve missing transactions, even if there are gaps in the sequence of applied transactions.

The following information provides combinations of settings that are appropriate for different types of replica to guarantee recovery as far as this is under the control of replication.

Important

Some factors outside the control of replication can have an impact on the replication recovery process and the overall state of replication after the recovery process. In particular, the settings that influence the recovery process for individual storage engines might result in transactions being lost in the event of an unexpected halt of a replica, and therefore unavailable to the replication recovery process. Theinnodb_flush_log_at_trx_commit=1setting mentioned in the list below is a key setting for a replication setup that usesInnoDBwith transactions. However, other settings specific toInnoDBor to other storage engines, especially those relating to flushing or synchronization, can also have an impact. Always check for and apply recommendations made by your chosen storage engines about crash-safe settings.

The following combination of settings on a replica is the most resilient to unexpected halts:

When GTID-based replication is in use (gtid_mode=ON), setSOURCE_AUTO_POSITION=1|MASTER_AUTO_POSITION=1, which activates GTID auto-positioning for the connection to the source to automatically identify and retrieve missing transactions. This option is set using aCHANGE REPLICATION SOURCE TOstatement (from MySQL 8.0.23) orCHANGE MASTER TOstatement (before MySQL 8.0.23). If the replica has multiple replication channels, you need to set this option for each channel individually. For details of how GTID auto-positioning works, seeSection 17.1.3.3, “GTID Auto-Positioning”。When file position based replication is in use,SOURCE_AUTO_POSITION=1|MASTER_AUTO_POSITION=1is not used, and instead the binary log position or relay log position is used to control where replication starts.
From MySQL 8.0.27, when GTID-based replication is in use (gtid_mode=ON), setGTID_ONLY=1, which makes the replica use only GTIDs in the recovery process, and stop persisting binary log and relay log file names and file positions in the replication metadata repositories. This option is set using aCHANGE REPLICATION SOURCE TOstatement. If the replica has multiple replication channels, you need to set this option for each channel individually. WithGTID_ONLY=1, during recovery, the file position information is ignored and GTID auto-skip is used to skip transactions that have already been supplied, rather than identifying the correct file position. This strategy is more efficient provided that you purge relay logs using the default setting forrelay_log_purge, which means only one relay log file needs to be inspected.
Setsync_relay_log=1, which instructs the replication receiver thread to synchronize the relay log to disk after each received transaction is written to it. This means the replica's record of the current position read from the source's binary log (in the applier metadata repository) is never ahead of the record of transactions saved in the relay log. Note that although this setting is the safest, it is also the slowest due to the number of disk writes involved. Withsync_relay_log > 1, orsync_relay_log=0(where synchronization is handled by the operating system), in the event of an unexpected halt of a replica there might be committed transactions that have not been synchronized to disk. Such transactions can cause the recovery process to fail if the recovering replica, based on the information it has in the relay log as last synchronized to disk, tries to retrieve and apply the transactions again instead of skipping them. Settingsync_relay_log=1是尤其重要的,因为一个多线程代表lica, where the recovery process fails if gaps in the sequence of transactions cannot be filled using the information in the relay log. For a single-threaded replica, the recovery process only needs to use the relay log if the relevant information is not available in the applier metadata repository.
Setinnodb_flush_log_at_trx_commit=1, which synchronizes theInnoDBlogs to disk before each transaction is committed. This setting, which is the default, ensures thatInnoDBtables and theInnoDBlogs are saved on disk so that there is no longer a requirement for the information in the relay log regarding the transaction. Combined with the settingsync_relay_log=1, this setting further ensures that the content of theInnoDBtables and theInnoDBlogs is consistent with the content of the relay log at all times, so that purging the relay log files cannot cause unfillable gaps in the replica's history of transactions in the event of an unexpected halt.
Setrelay_log_info_repository = TABLE, which stores the replication SQL thread position in theInnoDBtablemysql.slave_relay_log_info, and updates it together with the transaction commit to ensure a record that is always accurate. This setting is the default from MySQL 8.0, and theFILEsetting is deprecated. From MySQL 8.0.23, the use of the system variable itself is deprecated, so omit it and allow it to default. If theFILEsetting is used, which was the default in earlier releases, the information is stored in a file in the data directory that is updated after the transaction has been applied. This creates a risk of losing synchrony with the source depending at which stage of processing a transaction the replica halts at, or even corruption of the file itself. With the settingrelay_log_info_repository = FILE, recovery is not guaranteed.
Setrelay_log_recovery = ON, which enables automatic relay log recovery immediately following server startup. This global variable defaults toOFFand is read-only at runtime, but you can set it toONwith the--relay-log-recoveryoption at replica startup following an unexpected halt of a replica. Note that this setting ignores the existing relay log files, in case they are corrupted or inconsistent. The relay log recovery process starts a new relay log file and fetches transactions from the source beginning at the replication SQL thread position recorded in the applier metadata repository. The previous relay log files are removed over time by the replica's normal purge mechanism.

For a multithreaded replica, settingrelay_log_recovery = ONautomatically handles any inconsistencies and gaps in the sequence of transactions that have been executed from the relay log. These gaps can occur when file position based replication is in use. (For more details, seeSection 17.5.1.34, “Replication and Transaction Inconsistencies”。) The relay log recovery process deals with gaps using the same method as theSTART REPLICA UNTIL SQL_AFTER_MTS_GAPS(or before MySQL 8.0.22, START SLAVE instead of START REPLICA) statement would. When the replica reaches a consistent gap-free state, the relay log recovery process goes on to fetch further transactions from the source beginning at the replication SQL thread position. When GTID-based replication is in use, from MySQL 8.0.18 a multithreaded replica checks first whetherMASTER_AUTO_POSITIONis set toON, and if it is, omits the step of calculating the transactions that should be skipped or not skipped, so that the old relay logs are not required for the recovery process.

PREV HOME UP NEXT