10bet网址
MySQL 5.7 Reference Manual
Related Documentation Download this Manual Excerpts from this Manual

12.10.9 MeCab Full-Text Parser Plugin

The built-in MySQL full-text parser uses the white space between words as a delimiter to determine where words begin and end, which is a limitation when working with ideographic languages that do not use word delimiters. To address this limitation for Japanese, MySQL provides a MeCab full-text parser plugin. The MeCab full-text parser plugin is supported for use withInnoDBandMyISAM.

Note

MySQL also provides an ngram full-text parser plugin that supports Japanese. For more information, seeSection 12.10.8, “ngram Full-Text Parser”.

The MeCab full-text parser plugin is a full-text parser plugin for Japanese that tokenizes a sequence of text into meaningful words. For example, MeCab tokenizesデータベース管理(Database Management) intoデータベース(Database) and管理(Management). By comparison, thengramfull-text parser tokenizes text into a contiguous sequence ofncharacters, wherenrepresents a number between 1 and 10.

In addition to tokenizing text into meaningful words, MeCab indexes are typically smaller than ngram indexes, and MeCab full-text searches are generally faster. One drawback is that it may take longer for the MeCab full-text parser to tokenize documents, compared to the ngram full-text parser.

The full-text search syntax described inSection 12.10, “Full-Text Search Functions”applies to the MeCab parser plugin. Differences in parsing behavior are described in this section. Full-text related configuration options are also applicable.

For additional information about the MeCab parser, refer to theMeCab: Yet Another Part-of-Speech and Morphological Analyzerproject on Github.

Installing the MeCab Parser Plugin

The MeCab parser plugin requiresmecabandmecab-ipadic.

On supported Fedora, Debian and Ubuntu platforms (except Ubuntu 12.04 where the systemmecabversion is too old), MySQL dynamically links to the systemmecabinstallation if it is installed to the default location. On other supported Unix-like platforms,libmecab.sois statically linked inlibpluginmecab.so, which is located in the MySQL plugin directory.mecab-ipadicis included in MySQL binaries and is located inMYSQL_HOME\lib\mecab.

You can installmecabandmecab-ipadicusing a native package management utility (on Fedora, Debian, and Ubuntu), or you can buildmecabandmecab-ipadicfrom source. For information about installingmecabandmecab-ipadicusing a native package management utility, seeInstalling MeCab From a Binary Distribution (Optional). If you want to buildmecabandmecab-ipadicfrom source, seeBuilding MeCab From Source (Optional).

On Windows,libmecab.dllis found in the MySQLbindirectory.mecab-ipadicis located inMYSQL_HOME/lib/mecab.

To install and configure the MeCab parser plugin, perform the following steps:

  1. In the MySQL configuration file, set themecab_rc_fileconfiguration option to the location of themecabrcconfiguration file, which is the configuration file for MeCab. If you are using the MeCab package distributed with MySQL, themecabrcfile is located inMYSQL_HOME/lib/mecab/etc/.

    (mysqld) loose-mecab-rc-file = MYSQL_HOME / lib / mecab /etc/mecabrc

    Thelooseprefix is anoption modifier. Themecab_rc_fileoption is not recognized by MySQL until the MeCaB parser plugin is installed but it must be set before attempting to install the MeCaB parser plugin. Thelooseprefix allows you restart MySQL without encountering an error due to an unrecognized variable.

    If you use your own MeCab installation, or build MeCab from source, the location of themecabrcconfiguration file may differ.

    For information about the MySQL configuration file and its location, seeSection 4.2.2.2, “Using Option Files”.

  2. Also in the MySQL configuration file, set the minimum token size to 1 or 2, which are the values recommended for use with the MeCab parser. ForInnoDBtables, minimum token size is defined by theinnodb_ft_min_token_sizeconfiguration option, which has a default value of 3. ForMyISAMtables, minimum token size is defined byft_min_word_len, which has a default value of 4.

    [mysqld] innodb_ft_min_token_size=1
  3. Modify themecabrcconfiguration file to specify the dictionary you want to use. Themecab-ipadicpackage distributed with MySQL binaries includes three dictionaries (ipadic_euc-jp,ipadic_sjis, andipadic_utf-8). Themecabrcconfiguration file packaged with MySQL contains and entry similar to the following:

    dicdir = /path/to/mysql/lib/mecab/lib/mecab/dic/ipadic_euc-jp

    To use theipadic_utf-8dictionary, for example, modify the entry as follows:

    dicdir=MYSQL_HOME/lib/mecab/dic/ipadic_utf-8

    If you are using your own MeCab installation or have built MeCab from source, the defaultdicdirentry in themecabrcfile differs, as do the dictionaries and their location.

    Note

    After the MeCab parser plugin is installed, you can use themecab_charsetstatus variable to view the character set used with MeCab. The three MeCab dictionaries provided with the MySQL binary support the following character sets.

    • Theipadic_euc-jpdictionary supports theujisandeucjpmscharacter sets.

    • Theipadic_sjisdictionary supports thesjisandcp932character sets.

    • Theipadic_utf-8dictionary supports theutf8andutf8mb4character sets.

    mecab_charsetonly reports the first supported character set. For example, theipadic_utf-8dictionary supports bothutf8andutf8mb4.mecab_charsetalways reportsutf8when this dictionary is in use.

  4. Restart MySQL.

  5. Install the MeCab parser plugin:

    The MeCab parser plugin is installed usingINSTALL PLUGINsyntax. The plugin name ismecab, and the shared library name islibpluginmecab.so. For additional information about installing plugins, seeSection 5.5.1, “Installing and Uninstalling Plugins”.

    INSTALL PLUGIN mecab SONAME 'libpluginmecab.so';

    Once installed, the MeCab parser plugin loads at every normal MySQL restart.

  6. Verify that the MeCab parser plugin is loaded using theSHOW PLUGINSstatement.

    mysql> SHOW PLUGINS;

    Amecabplugin should appear in the list of plugins.

Creating a FULLTEXT Index that uses the MeCab Parser

To create aFULLTEXTindex that uses the mecab parser, specifyWITH PARSER ngramwithCREATE TABLE,ALTER TABLE, orCREATE INDEX.

This example demonstrates creating a table with amecabFULLTEXTindex, inserting sample data, and viewing tokenized data in theINFORMATION_SCHEMA.INNODB_FT_INDEX_CACHEtable:

mysql> USE test; mysql> CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) WITH PARSER mecab ) ENGINE=InnoDB CHARACTER SET utf8; mysql> SET NAMES utf8; mysql> INSERT INTO articles (title,body) VALUES ('データベース管理','このチュートリアルでは、私はどのようにデータベースを管理する方法を紹介します'), ('データベースアプリケーション開発','データベースアプリケーションを開発することを学ぶ'); mysql> SET GLOBAL innodb_ft_aux_table="test/articles"; mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;

To add aFULLTEXTindex to an existing table, you can useALTER TABLEorCREATE INDEX. For example:

CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT ) ENGINE=InnoDB CHARACTER SET utf8; ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER mecab; # Or: CREATE FULLTEXT INDEX ft_index ON articles (title,body) WITH PARSER mecab;

MeCab Parser Space Handling

The MeCab parser uses spaces as separators in query strings. For example, the MeCab parser tokenizesデータベース管理asデータベースand管理.

MeCab Parser Stopword Handling

By default, the MeCab parser uses the default stopword list, which contains a short list of English stopwords. For a stopword list applicable to Japanese, you must create your own. For information about creating stopword lists, seeSection 12.10.4, “Full-Text Stopwords”.

MeCab Parser Term Search

For natural language mode search, the search term is converted to a union of tokens. For example,データベース管理is converted toデータベース 管理.

SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース管理' IN NATURAL LANGUAGE MODE);

For boolean mode search, the search term is converted to a search phrase. For example,データベース管理is converted toデータベース 管理.

SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース管理' IN BOOLEAN MODE);

MeCab Parser Wildcard Search

Wildcard search terms are not tokenized. A search onデータベース管理*is performed on the prefix,データベース管理.

SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース*' IN BOOLEAN MODE);

MeCab Parser Phrase Search

Phrases are tokenized. For example,データベース管理is tokenized asデータベース 管理.

SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('"データベース管理"' IN BOOLEAN MODE);

Installing MeCab From a Binary Distribution (Optional)

This section describes how to installmecabandmecab-ipadicfrom a binary distribution using a native package management utility. For example, on Fedora, you can use Yum to perform the installation:

yum mecab-devel

On Debian or Ubuntu, you can perform an APT installation:

apt-get install mecab apt-get install mecab-ipadic

Installing MeCab From Source (Optional)

If you want to buildmecabandmecab-ipadicfrom source, basic installation steps are provided below. For additional information, refer to the MeCab documentation.

  1. Download the tar.gz packages formecabandmecab-ipadicfromhttp://taku910.github.io/mecab/#download. As of February, 2016, the latest available packages aremecab-0.996.tar.gzandmecab-ipadic-2.7.0-20070801.tar.gz.

  2. Installmecab:

    tar zxfv mecab-0.996.tar cd mecab-0.996 ./configure make make check su make install
  3. Installmecab-ipadic:

    tar zxfv mecab-ipadic-2.7.0-20070801.tar cd mecab-ipadic-2.7.0-20070801 ./configure make su make install
  4. Compile MySQL using theWITH_MECABCMake option. Set theWITH_MECABoption tosystemif you have installedmecabandmecab-ipadicto the default location.

    -DWITH_MECAB=system

    If you defined a custom installation directory, setWITH_MECABto the custom directory. For example:

    -DWITH_MECAB=/path/to/mecab