
MySQL ngram Fulltext Parser
The MySQL ngram
full-text parser is a specialized text parser that allows you to perform n-gram based full-text searches. N-grams are contiguous sequences of ‘n’ items (characters or words) extracted from a text. Using the ngram
full-text parser, you can perform searches based on such n-grams.
Here’s how you can use the ngram
full-text parser for full-text searches in MySQL:
- Create a Full-Text Index with
ngram
Parser: You need to create a full-text index on the column you want to search using thengram
parser. You can specify this when creating the table or alter an existing table.
CREATE TABLE articles (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
FULLTEXT(content) WITH PARSER ngram;
In the above example, we’ve specified the ngram
parser for the content
column.
- Perform a Full-Text Search: To perform a full-text search using the
ngram
parser, you use theMATCH ... AGAINST
clause in your SQL query. You can search for words or phrases based on n-grams.
SELECT * FROM articles
WHERE MATCH(content) AGAINST('search_term' IN NATURAL LANGUAGE MODE);
Replace 'search_term'
with the n-gram you want to search for.
- Sorting by Relevance: You can sort the results by relevance using the
MATCH ... AGAINST
clause in theORDER BY
clause, similar to the standard full-text search. - Word Length Limit and Stopwords: The
ngram
parser does not have the same word length limit or stopwords as the standard full-text search. It works with shorter words and doesn’t exclude common stopwords. - Customizing n-gram Length: By default, the
ngram
parser uses trigrams (3-grams), but you can customize the n-gram length by setting theinnodb_ft_min_token_size
andinnodb_ft_max_token_size
configuration parameters in your MySQL configuration.
Keep in mind that the ngram
full-text parser is particularly useful for languages with no clear word boundaries, such as Chinese or Japanese. It can also be helpful in certain search scenarios where standard full-text search might not provide the desired results.
As with any full-text search method, the choice of the parser and search approach should depend on your specific use case and the characteristics of your text data.