20.9.1 LOAD DATA INFILE Events

LOAD DATA INFILEis not written to the binary log like other statements. It is written as one or more events in a packed format, not as a cleartext statement in the binary log. The events indicate what options are present in the statement and how to process the data file.

Historically, there seem to have been at least four event sequences for representingLOAD DATA INFILEoperations.

1) In MySQL 3.23, there was only one event:Load_log_event(type codeLOAD_EVENT= 6).Load_log_eventonly contains the filename, not the file itself. When the slave sees aLoad_log_event, it requests that the master send the file in a separate connection. This has the drawback that the binary log is not self-contained: If the file has been removed on the master or the slave cannot access the master, the file transfer fails.

2) In MySQL 4.0.0, the file contents were included in the binary log. Several new event types were introduced:Create_file_log_event(type codeCREATE_FILE_EVENT= 8),Append_block_log_event(type codeAPPEND_BLOCK_EVENT= 9),Execute_load_log_event(type codeEXEC_LOAD_EVENT= 10), andDelete_file_log_event(type codeDELETE_FILE_EVENT= 11). The event sequence is:

  • Create_file_log_event: 1 instance

  • Append_block_log_event: 0 or more instances

  • Execute_load_log_event(success) orDelete_file_log_event(失败):1实例

TheCreate_file_log_eventcontains the options toLOAD DATA INFILE. This was a design flaw since the file cannot be loaded until theExec_load_log_eventis seen. To work around this, the slave, when executing theCreate_file_log_event, wrote theCreate_file_log_eventto a temporary file. When theExecute_load_log_eventwas seen, this temporary file was read back so that theLOAD DATA INFILEstatement could be constructed.

Append_block_log_eventis used for files larger than a threshold. In this case, the file is split and the pieces are sent in separate events. The threshold is around 2^17 = 131072 bytes.

CREATE_FILE_EVENTtells the slave to create a temporary file and fill it with a first data block. Later, zero or moreAPPEND_BLOCK_EVENTevents append blocks to this temporary file.EXEC_LOAD_EVENTtells the slave to load the temporary file into the table, orDELETE_FILE_EVENTtells the slave not to do the load and to delete the temporary file.DELETE_FILE_EVENToccurs when theLOAD DATAfailed on the master: On the master we start to write loaded blocks to the binary log before the end of the statement. If for some reason there is an error, we must tell the slave to abort the load.

MySQL 4.0.0 also introduced theNEW_LOAD_EVENT= 12 type code.

If a slave reads aNEW_LOAD_EVENTfrom a binlog, it will use it as aLOAD_EVENT(but allowing longer separator names). Lost in the mysteries of time is the knowledge of whether there was ever a server version capable of writingNEW_LOAD_EVENT.

3) The originalLoad_log_eventused one character for each of the delimiters (FIELDS TERMINATED BY, and so forth). At an unknown point in the version history, the format was modified to allow multiple-character strings as separators. This uses the same class,Load_log_event, but has the type codeNEW_LOAD_EVENT= 12. This affectsCreate_file_log_event, since that inherits from Load_log_event. So the new feature inLoad_log_eventallowsCreate_file_log_eventto use multiple-character delimiters.

[A guess for when this occurred would be MySQL 4.0.0: That is when thesql_exstructure that holds the single-character field/line options was renamed toold_sql_exand a newsql_exstructure was created that allows multiple-character values.]

4) In 5.0.3, the event sequence was changed again with the addition of two new event types:Begin_load_query_log_event(type codeBEGIN_LOAD_QUERY_EVENT= 17) andExecute_load_query_log_event(type codeEXECUTE_LOAD_QUERY_EVENT= 18). The event sequence is:

  • Begin_load_query_log_event: 1 instance

  • Append_block_log_event: 0 or more instances

  • Execute_load_query_log_event(success) orDelete_file_log_event(失败):1实例

With the new sequence, information about the options toLOAD DATA INFILEis moved from the first event to the last event. Consequently,Begin_load_query_log_eventis almost the same asAppend_file_log_event(it contains only file data), whereasExecute_load_query_log_eventcontains the text of theLOAD DATA INFILEstatement. The revised event sequence fixes the design flaw in the 4.0 format.

Also, the temp file that stores the parameters toLOAD DATA INFILEis not needed anymore. There is still a temp file containing all the data to be loaded.

Here is a concrete example (it applies to MySQL 4.0 and 4.1):

On the master we have a file named/m/tmp/u.txt包含这些线:

>1,2,3 >4,5,6 >7,8,9 >10,11,12

And we issue this statement on the master while the default database istest:

load data infile '/m/tmp/u.txt' replace into table x fields terminated by ',' optionally enclosed by '"' escaped by '\\' lines starting by '>' terminated by '\n' ignore 2 lines (a,b,c);

Then in the master's binary log we have thisCREATE_FILE_EVENT(hexadecimal dump):

00000180: db4f 153f 0801 0000 .........O.?.... 00000190: 006f 0000 0088 0100 0000 0004 0000 0000 .o.............. 000001a0: 0000 0002 0000 0001 0403 0000 0003 0000 ................ 000001b0: 0001 2c01 2201 0a01 3e01 5c06 0101 0161 ..,."...>.\....a 000001c0: 0062 0063 0078 0074 6573 7400 2f6d 2f74 .b.c.x.test./m/t 000001d0: 6d70 2f75 2e74 7874 003e 312c 322c 330a mp/u.txt.>1,2,3. 000001e0: 3e34 2c35 2c36 0a3e 372c 382c 390a 3e31 >4,5,6.>7,8,9.>1 000001f0: 302c 3131 2c31 32db 4f15 3f0a 0100 0000 0,11,12.O.?..... 00000200: 1700 0000 f701 0000 0000 0300 0000 ..............
  • Line 180:

    • Timestamp (db4f 153f)

    • Event type (08)

    • Server ID (01 0000 00)

  • Line 190:

    • Event size (6f 0000 00)

    • Position in the binary log (88 0100 00) (that's 392 in decimal base)

    • 旗帜(0000)

    • Thread ID (04 0000 00)

    • Time it took (00 0000 00)

  • Line 1a0:

    • Number of lines to skip at the beginning of the file (02 0000 00)

    • Length of the table name (01)

    • Length of the database name (04)

    • Number of columns to load (03 0000 00)

    • The file ID (03 0000 00)

  • Line 1b0:

    • Length of the field terminating string (01)

    • Field terminating string (2c = ,)

    • Length of the field enclosing string (01)

    • Field enclosing string (22 = ")

    • Length of the line terminating string (01)

    • Line terminating string (0a = newline)

    • Length of the line starting string (01)

    • Line starting string (3e = >)

    • Length of the escaping string (01)

    • Escaping string (5c = \)

    • Flags (06) (that'sOPT_ENCLOSED_FLAG|REPLACE_FLAG)

    • Length of the name of the first column to load (01)

    • Length of the name of the second column to load (01)

    • Length of the name of the third column to load (01)

    • Name of the first column to load (61 00 = "a")

  • Line 1c0:

    • Name of the second column to load (62 00 = "b")

    • Name of the third column to load (63 00 = "c")

    • Name of the table to load (78 00 = "x"), name of the database that contains the table (74 6573 7400 = "test")

    • Name of the file on the master (2f6d 2f74 6d70 2f75 2e74 7874 00 = "/m/tmp/u.txt")

  • Line 1d0 and following:

    • Raw data to load (3e 312c 322c 330a 3e34 2c35 2c36 0a3e 372c 382c 390a 3e31 302c 3131 2c31 32)

The next byte in the file is the beginning of theEXEC_LOAD_EVENTevent.