10bet网址
MySQL 8.0 Reference Manual
Related Documentation Download this Manual Excerpts from this Manual

12.12 XML Functions

Table 12.16 XML Functions

Name Description
ExtractValue() Extract a value from an XML string using XPath notation
UpdateXML() Return replaced XML fragment

This section discusses XML and related functionality in MySQL.

Note

It is possible to obtain XML-formatted output from MySQL in themysqlandmysqldumpclients by invoking them with the--xmloption. See4.5.1节,“mysql——mysql命令行Client”, andSection 4.5.4, “mysqldump — A Database Backup Program”.

两个函数提供基本XPath 1.0 (XML路径Language, version 1.0) capabilities are available. Some basic information about XPath syntax and usage is provided later in this section; however, an in-depth discussion of these topics is beyond the scope of this manual, and you should refer to theXML Path Language (XPath) 1.0 standardfor definitive information. A useful resource for those new to XPath or who desire a refresher in the basics is theZvon.org XPath Tutorial, which is available in several languages.

Note

These functions remain under development. We continue to improve these and other aspects of XML and XPath functionality in MySQL 8.0 and onwards. You may discuss these, ask questions about them, and obtain help from other users with them in theMySQL XML User Forum.

XPath expressions used with these functions support user variables and local stored program variables. User variables are weakly checked; variables local to stored programs are strongly checked (see also Bug #26518):

  • User variables (weak checking).Variables using the syntax$@variable_name(that is, user variables) are not checked. No warnings or errors are issued by the server if a variable has the wrong type or has previously not been assigned a value. This also means the user is fully responsible for any typographical errors, since no warnings are given if (for example)$@myvairableis used where$@myvariablewas intended.

    Example:

    mysql> SET @xml = 'XY'; Query OK, 0 rows affected (0.00 sec) mysql> SET @i =1, @j = 2; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @i, ExtractValue(@xml, '//b[$@i]'); +------+--------------------------------+ | @i | ExtractValue(@xml, '//b[$@i]') | +------+--------------------------------+ | 1 | X | +------+--------------------------------+ 1 row in set (0.00 sec) mysql> SELECT @j, ExtractValue(@xml, '//b[$@j]'); +------+--------------------------------+ | @j | ExtractValue(@xml, '//b[$@j]') | +------+--------------------------------+ | 2 | Y | +------+--------------------------------+ 1 row in set (0.00 sec) mysql> SELECT @k, ExtractValue(@xml, '//b[$@k]'); +------+--------------------------------+ | @k | ExtractValue(@xml, '//b[$@k]') | +------+--------------------------------+ | NULL | | +------+--------------------------------+ 1 row in set (0.00 sec)
  • Variables in stored programs (strong checking).Variables using the syntax$variable_namecan be declared and used with these functions when they are called inside stored programs. Such variables are local to the stored program in which they are defined, and are strongly checked for type and value.

    Example:

    mysql> DELIMITER | mysql> CREATE PROCEDURE myproc () -> BEGIN -> DECLARE i INT DEFAULT 1; -> DECLARE xml VARCHAR(25) DEFAULT 'XYZ'; -> -> WHILE i < 4 DO -> SELECT xml, i, ExtractValue(xml, '//a[$i]'); -> SET i = i+1; -> END WHILE; -> END | Query OK, 0 rows affected (0.01 sec) mysql> DELIMITER ; mysql> CALL myproc(); +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 1 | X | +--------------------------+---+------------------------------+ 1 row in set (0.00 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 2 | Y | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 3 | Z | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec)

    Parameters.Variables used in XPath expressions inside stored routines that are passed in as parameters are also subject to strong checking.

Expressions containing user variables or variables local to stored programs must otherwise (except for notation) conform to the rules for XPath expressions containing variables as given in the XPath 1.0 specification.

Note

A user variable used to store an XPath expression is treated as an empty string. Because of this, it is not possible to store an XPath expression as a user variable. (Bug #32911)

  • ExtractValue(xml_frag,xpath_expr)

    ExtractValue()takes two string arguments, a fragment of XML markupxml_fragand an XPath expressionxpath_expr(also known as alocator); it returns the text (CDATA) of the first text node which is a child of the element or elements matched by the XPath expression.

    Using this function is the equivalent of performing a match using thexpath_exprafter appending/text(). In other words,ExtractValue('Sakila', '/a/b')andExtractValue('Sakila', '/a/b/text()')produce the same result.

    If multiple matches are found, the content of the first child text node of each matching element is returned (in the order matched) as a single, space-delimited string.

    If no matching text node is found for the expression (including the implicit/text())—for whatever reason, as long asxpath_expris valid, andxml_fragconsists of elements which are properly nested and closed—an empty string is returned. No distinction is made between a match on an empty element and no match at all. This is by design.

    If you need to determine whether no matching element was found inxml_fragor such an element was found but contained no child text nodes, you should test the result of an expression that uses the XPathcount()function. For example, both of these statements return an empty string, as shown here:

    mysql> SELECT ExtractValue('', '/a/b'); +-------------------------------------+ | ExtractValue('', '/a/b') | +-------------------------------------+ | | +-------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue('', '/a/b'); +-------------------------------------+ | ExtractValue('', '/a/b') | +-------------------------------------+ | | +-------------------------------------+ 1 row in set (0.00 sec)

    However, you can determine whether there was actually a matching element using the following:

    mysql> SELECT ExtractValue('', 'count(/a/b)'); +-------------------------------------+ | ExtractValue('', 'count(/a/b)') | +-------------------------------------+ | 1 | +-------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue('', 'count(/a/b)'); +-------------------------------------+ | ExtractValue('', 'count(/a/b)') | +-------------------------------------+ | 0 | +-------------------------------------+ 1 row in set (0.01 sec)
    Important

    ExtractValue()returns onlyCDATA, and does not return any tags that might be contained within a matching tag, nor any of their content (see the result returned asval1in the following example).

    mysql> SELECT -> ExtractValue('cccddd', '/a') AS val1, -> ExtractValue('cccddd', '/a/b') AS val2, -> ExtractValue('cccddd', '//b') AS val3, -> ExtractValue('cccddd', '/b') AS val4, -> ExtractValue('cccdddeee', '//b') AS val5; +------+------+------+------+---------+ | val1 | val2 | val3 | val4 | val5 | +------+------+------+------+---------+ | ccc | ddd | ddd | | ddd eee | +------+------+------+------+---------+

    This function uses the current SQL collation for making comparisons withcontains(), performing the same collation aggregation as other string functions (such asCONCAT()), in taking into account the collation coercibility of their arguments; seeSection 10.8.4, “Collation Coercibility in Expressions”, for an explanation of the rules governing this behavior.

    (Previously, binary—that is, case-sensitive—comparison was always used.)

    NULLis returned ifxml_fragcontains elements which are not properly nested or closed, and a warning is generated, as shown in this example:

    mysql> SELECT ExtractValue('cc SHOW WARNINGS\G *************************** 1. row *************************** Level: Warning Code: 1525 Message: Incorrect XML value: 'parse error at line 1 pos 11: END-OF-INPUT unexpected ('>' wanted)' 1 row in set (0.00 sec) mysql> SELECT ExtractValue('c', '//a'); +-------------------------------------+ | ExtractValue('c', '//a') | +-------------------------------------+ | c | +-------------------------------------+ 1 row in set (0.00 sec)
  • UpdateXML(xml_target,xpath_expr,new_xml)

    This function replaces a single portion of a given fragment of XML markupxml_targetwith a new XML fragmentnew_xml, and then returns the changed XML. The portion ofxml_targetthat is replaced matches an XPath expressionxpath_exprsupplied by the user.

    If no expression matchingxpath_expris found, or if multiple matches are found, the function returns the originalxml_targetXML fragment. All three arguments should be strings.

    mysql> SELECT -> UpdateXML('ccc', '/a', 'fff') AS val1, -> UpdateXML('ccc', '/b', 'fff') AS val2, -> UpdateXML('ccc', '//b', 'fff') AS val3, -> UpdateXML('ccc', '/a/d', 'fff') AS val4, -> UpdateXML('ccc', '/a/d', 'fff') AS val5 -> \G *************************** 1. row *************************** val1: fff val2: ccc val3: fff val4: cccfff val5: ccc
Note

A discussion in depth of XPath syntax and usage are beyond the scope of this manual. Please see theXML Path Language (XPath) 1.0 specificationfor definitive information. A useful resource for those new to XPath or who are wishing a refresher in the basics is theZvon.org XPath Tutorial, which is available in several languages.

Descriptions and examples of some basic XPath expressions follow:

  • /tag

    Matches<tag/>if and only if<tag/>is the root element.

    Example:/ahas a match inbecause it matches the outermost (root) tag. It does not match the inneraelement inbecause in this instance it is the child of another element.

  • /tag1/tag2

    Matches<tag2/>if and only if it is a child of<tag1/>, and<tag1/>is the root element.

    Example:/a/bmatches thebelement in the XML fragmentbecause it is a child of the root elementa. It does not have a match inbecause in this case,bis the root element (and hence the child of no other element). Nor does the XPath expression have a match in; here,bis a descendant ofa, but not actually a child ofa.

    This construct is extendable to three or more elements. For example, the XPath expression/a/b/cmatches thecelement in the fragment.

  • //tag

    Matches any instance of<tag>.

    Example://amatches theaelement in any of the following:;;.

    //can be combined with/. For example,//a/bmatches thebelement in either of the fragmentsor.

    Note

    //tagis the equivalent of/descendant-or-self::*/tag. A common error is to confuse this with/descendant-or-self::tag, although the latter expression can actually lead to very different results, as can be seen here:

    mysql> SET @xml = 'wxyz'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @xml; +-----------------------------------------+ | @xml | +-----------------------------------------+ | wxyz | +-----------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue(@xml, '//b[1]'); +------------------------------+ | ExtractValue(@xml, '//b[1]') | +------------------------------+ | x z | +------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue(@xml, '//b[2]'); +------------------------------+ | ExtractValue(@xml, '//b[2]') | +------------------------------+ | | +------------------------------+ 1 row in set (0.01 sec) mysql> SELECT ExtractValue(@xml, '/descendant-or-self::*/b[1]'); +---------------------------------------------------+ | ExtractValue(@xml, '/descendant-or-self::*/b[1]') | +---------------------------------------------------+ | x z | +---------------------------------------------------+ 1 row in set (0.06 sec) mysql> SELECT ExtractValue(@xml, '/descendant-or-self::*/b[2]'); +---------------------------------------------------+ | ExtractValue(@xml, '/descendant-or-self::*/b[2]') | +---------------------------------------------------+ | | +---------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue(@xml, '/descendant-or-self::b[1]'); +-------------------------------------------------+ | ExtractValue(@xml, '/descendant-or-self::b[1]') | +-------------------------------------------------+ | z | +-------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue(@xml, '/descendant-or-self::b[2]'); +-------------------------------------------------+ | ExtractValue(@xml, '/descendant-or-self::b[2]') | +-------------------------------------------------+ | x | +-------------------------------------------------+ 1 row in set (0.00 sec)
  • The*operator acts as awildcardthat matches any element. For example, the expression/*/bmatches thebelement in either of the XML fragmentsor. However, the expression does not produce a match in the fragmentbecausebmust be a child of some other element. The wildcard may be used in any position: The expression/*/b/*matches any child of abelement that is itself not the root element.

  • You can match any of several locators using the|(UNION) operator. For example, the expression//b|//cmatches allbandcelements in the XML target.

  • It is also possible to match an element based on the value of one or more of its attributes. This done using the syntaxtag[@attribute="value"]. For example, the expression//b[@id="idB"]matches the secondbelement in the fragment. To match againstanyelement havingattribute="value", use the XPath expression//*[attribute="value"].

    To filter multiple attribute values, simply use multiple attribute-comparison clauses in succession. For example, the expression//b[@c="x"][@d="y"]matches the elementoccurring anywhere in a given XML fragment.

    To find elements for which the same attribute matches any of several values, you can use multiple locators joined by the|operator. For example, to match allbelements whosecattributes have either of the values 23 or 17, use the expression//b[@c="23"]|//b[@c="17"]. You can also use the logicalor为此运营商://b[@c="23" or @c="17"].

    Note

    The difference betweenorand|is thatorjoins conditions, while|joins result sets.

XPath Limitations.The XPath syntax supported by these functions is currently subject to the following limitations:

  • Nodeset-to-nodeset comparison (such as'/a/b[@c=@d]') is not supported.

  • All of the standard XPath comparison operators are supported. (Bug #22823)

  • 相对定位器表达式解析context of the root node. For example, consider the following query and result:

    mysql> SELECT ExtractValue( -> 'XY', -> 'a/b' -> ) AS result; +--------+ | result | +--------+ | X Y | +--------+ 1 row in set (0.03 sec)

    In this case, the locatora/bresolves to/a/b.

    Relative locators are also supported within predicates. In the following example,d[../@c="1"]is resolved as/a/b[@c="1"]/d:

    mysql >选择ExtractValue(- >的< > - > < b c = " 1 " > < d>X -> X -> ', -> 'a/b/d[../@c="1"]') -> AS result; +--------+ | result | +--------+ | X | +--------+ 1 row in set (0.00 sec)
  • Locators prefixed with expressions that evaluate as scalar values—including variable references, literals, numbers, and scalar function calls—are not permitted, and their use results in an error.

  • The::operator is not supported in combination with node types such as the following:

    • axis::comment()

    • axis::text()

    • axis::processing-instructions()

    • axis::node()

    However, name tests (such asaxis::nameandaxis::*) are supported, as shown in these examples:

    mysql> SELECT ExtractValue('xy','/a/child::b'); +-------------------------------------------------------+ | ExtractValue('xy','/a/child::b') | +-------------------------------------------------------+ | x | +-------------------------------------------------------+ 1 row in set (0.02 sec) mysql> SELECT ExtractValue('xy','/a/child::*'); +-------------------------------------------------------+ | ExtractValue('xy','/a/child::*') | +-------------------------------------------------------+ | x y | +-------------------------------------------------------+ 1 row in set (0.01 sec)
  • Up-and-downnavigation is not supported in cases where the path would leadabovethe root element. That is, you cannot use expressions which match on descendants of ancestors of a given element, where one or more of the ancestors of the current element is also an ancestor of the root element (see Bug #16321).

  • The following XPath functions are not supported, or have known issues as indicated:

    • id()

    • lang()

    • local-name()

    • name()

    • namespace-uri()

    • normalize-space()

    • starts-with()

    • string()

    • substring-after()

    • substring-before()

    • translate()

  • The following axes are not supported:

    • following-sibling

    • following

    • preceding-sibling

    • preceding

XPath expressions passed as arguments toExtractValue()andUpdateXML()may contain the colon character (:) in element selectors, which enables their use with markup employing XML namespaces notation. For example:

mysql> SET @xml = '111222333444'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT ExtractValue(@xml, '//e:f'); +-----------------------------+ | ExtractValue(@xml, '//e:f') | +-----------------------------+ | 444 | +-----------------------------+ 1 row in set (0.00 sec) mysql> SELECT UpdateXML(@xml, '//b:c', '555'); +--------------------------------------------+ | UpdateXML(@xml, '//b:c', '555') | +--------------------------------------------+ | 111555 | +--------------------------------------------+ 1 row in set (0.00 sec)

This is similar in some respects to what is permitted byApache Xalanand some other parsers, and is much simpler than requiring namespace declarations or the use of thenamespace-uri()andlocal-name()functions.

Error handling.For bothExtractValue()andUpdateXML(), the XPath locator used must be valid and the XML to be searched must consist of elements which are properly nested and closed. If the locator is invalid, an error is generated:

mysql> SELECT ExtractValue('c', '/&a'); ERROR 1105 (HY000): XPATH syntax error: '&a'

Ifxml_fragdoes not consist of elements which are properly nested and closed,NULLis returned and a warning is generated, as shown in this example:

mysql> SELECT ExtractValue('cc SHOW WARNINGS\G *************************** 1. row *************************** Level: Warning Code: 1525 Message: Incorrect XML value: 'parse error at line 1 pos 11: END-OF-INPUT unexpected ('>' wanted)' 1 row in set (0.00 sec) mysql> SELECT ExtractValue('c', '//a'); +-------------------------------------+ | ExtractValue('c', '//a') | +-------------------------------------+ | c | +-------------------------------------+ 1 row in set (0.00 sec)
Important

The replacement XML used as the third argument toUpdateXML()isnotchecked to determine whether it consists solely of elements which are properly nested and closed.

XPath Injection.code injectionoccurs when malicious code is introduced into the system to gain unauthorized access to privileges and data. It is based on exploiting assumptions made by developers about the type and content of data input from users. XPath is no exception in this regard.

A common scenario in which this can happen is the case of application which handles authorization by matching the combination of a login name and password with those found in an XML file, using an XPath expression like this one:

//user[login/text()='neapolitan' and password/text()='1c3cr34m']/attribute::id

This is the XPath equivalent of an SQL statement like this one:

SELECT id FROM users WHERE login='neapolitan' AND password='1c3cr34m';

A PHP application employing XPath might handle the login process like this:

xpath($xpath)) echo "You are now logged in as user $result[0]."; else echo "Invalid login name or password."; } else exit("Failed to open $file."); ?>

No checks are performed on the input. This means that a malevolent user canshort-circuitthe test by entering' or 1=1for both the login name and password, resulting in$xpathbeing evaluated as shown here:

//user[login/text()='' or 1=1 and password/text()='' or 1=1]/attribute::id

Since the expression inside the square brackets always evaluates astrue, it is effectively the same as this one, which matches theidattribute of everyuserelement in the XML document:

//user/attribute::id

One way in which this particular attack can be circumvented is simply by quoting the variable names to be interpolated in the definition of$xpath, forcing the values passed from a Web form to be converted to strings:

$xpath = "//user[login/text()='$login' and password/text()='$password']/attribute::id";

This is the same strategy that is often recommended for preventing SQL injection attacks. In general, the practices you should follow for preventing XPath injection attacks are the same as for preventing SQL injection:

  • Never accepted untested data from users in your application.

  • Check all user-submitted data for type; reject or convert data that is of the wrong type

  • Test numeric data for out of range values; truncate, round, or reject values that are out of range. Test strings for illegal characters and either strip them out or reject input containing them.

  • Do not output explicit error messages that might provide an unauthorized user with clues that could be used to compromise the system; log these to a file or database table instead.

Just as SQL injection attacks can be used to obtain information about database schemas, so can XPath injection be used to traverse XML files to uncover their structure, as discussed in Amit Klein's paperBlind XPath Injection(PDF file, 46KB).

It is also important to check the output being sent back to the client. Consider what can happen when we use the MySQLExtractValue()function:

mysql> SELECT ExtractValue( -> LOAD_FILE('users.xml'), -> '//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id' -> ) AS id; +-------------------------------+ | id | +-------------------------------+ | 00327 13579 02403 42354 28570 | +-------------------------------+ 1 row in set (0.01 sec)

BecauseExtractValue()returns multiple matches as a single space-delimited string, this injection attack provides every valid ID contained withinusers.xmlto the user as a single row of output. As an extra safeguard, you should also test output before returning it to the user. Here is a simple example:

mysql> SELECT @id = ExtractValue( -> LOAD_FILE('users.xml'), -> '//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id' -> ); Query OK, 0 rows affected (0.00 sec) mysql> SELECT IF( -> INSTR(@id, ' ') = 0, -> @id, -> 'Unable to retrieve user ID') -> AS singleID; +----------------------------+ | singleID | +----------------------------+ | Unable to retrieve user ID | +----------------------------+ 1 row in set (0.00 sec)

In general, the guidelines for returning data to users securely are the same as for accepting user input. These can be summed up as:

  • Always test outgoing data for type and permissible values.

  • Never permit unauthorized users to view error messages that might provide information about the application that could be used to exploit it.