在mysql中使用全文索引

时间:2022-09-22 11:29:32
在使用like进行模糊查询,当数据量大到一定程序的时候,我们会发现查询的效率是相当低下的。下面就此介绍一下mysql提供全文索引和搜索的功能。

正文:

下面是从参考资料里头摘录下来的内容,具体使用见参考资料,那里更详细哦


1) 很多互联网应用程序都提供了全文搜索功能,用户可以使用一个词或者词语片断作为查询项目来定位匹配的记录。在后台,这些程序使用在一个SELECT查询中的LIKE语句来执行这种查询,尽管这种方法可行,但对于全文查找而言,这是一种效率极端低下的方法,尤其在处理大量数据的时候。
  MySQL 针对这一问题提供了一种基于内建的全文查找方式的解决方案。在此,开发者只需要简单地标记出需要全文查找的字段,然后使用特殊的MySQL方法在那些字段运行搜索,这不仅仅提高了性能和效率(因为MySQL对这些字段做了索引来优化搜索),而且实现了更高质量的搜索,因为MySQL使用自然语言来智能地对结果评级,以去掉不相关的项目。

2)到 3.23.23 时,MySQL 开始支持全文索引和搜索。全文索引在 MySQL 中是一个 FULLTEXT 类型索引。FULLTEXT 索引用于 MyISAM 表,可以在 CREATE TABLE 时或之后使用 ALTER TABLE 或 CREATE INDEX 在 CHAR、VARCHAR 或 TEXT 列上创建。对于大的数据库,将数据装载到一个没有 FULLTEXT 索引的表中,然后再使用 ALTER TABLE (或 CREATE INDEX) 创建索引,这将是非常快的。将数据装载到一个已经有 FULLTEXT 索引的表中,将是非常慢的。

全文搜索通过 MATCH() 函数完成。 

下面大概介绍一下创建全文索引的过程,主要有两种办法:
第一种是创建表的时候就创建全文索引,第二种类是在创建表以后再增加全文索引,通过上面的引文我们知道后者比前者有些好处

1、先创建表,然后通过"AlTER TABLE"增加全文索引

mysql> create database ftt;          --创建数据库
Query OK, 1 row affected (0.03 sec)
mysql> use ftt;                  --使用数据库
Database changed
mysql> create table reviews            --创建表
         -> (
         -> id int(5) primary key not null auto_increment,
         -> data text
         -> );
Query OK, 0 rows affected (0.09 sec)

mysql> INSERT INTO `reviews` (`id`, `data`) VALUES(1, 'Gingerboy has a new singl
e out called Throwing Rocks. It\'sgreat!'); --下面三句都是插入数据
Query OK, 1 row affected (0.06 sec)

mysql> INSERT INTO `reviews` (`id`, `data`) VALUES (2, 'Hello all, I really like
the new Madonna single. One of the hottest tracks currently playing...I\'ve bee
n listening to it all day');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `reviews` (`id`, `data`)VALUES (3, 'Have you heard the new ba
nd Hotter Than Hell?They have five members and they burn their instruments when
they play in concerts. These guys totally rock! Like, awesome, dude!');
Query OK, 1 row affected (0.00 sec)

mysql> alter table reviews add fulltext index (data);          --通过alter增加全文索引
Query OK, 3 rows affected (0.16 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select id from reviews where match (data) against ('Madonna'); --通过match和against实现全文检索
+----+
| id |
+----+
| 2 |
+----+
1 row in set (0.00 sec)

mysql> select id from reviews where match (data) against ('+Madonna+instruments'
); --搜索出包含Madonna或者instruments的所有记录的id号
+----+
| id |
+----+
| 2 |
| 3 |
+----+
2 rows in set (0.00 sec)

mysql> select id from reviews where match (data) against ('Mado*' in boolean mod
e); --检索出包含Mado的所有记录的id
+----+
| id |
+----+
| 2 |
+----+
1 row in set (0.00 sec) mysql> SELECT MATCH(AAA,BBBB) AGAINST('XIEGUOKING') FROM TEST这样在2个字段中检索条件



补充
         从上面的结果不难看出,mysql的全文索引和检索支持多关键字,从下面的参考资料,你还可以发现更多实现,看看去哦

2、在创建表的时候就创建全文索引

由于其他的内容和上面差不多,下面只给出创建时的语法

create table reviews              --创建表的同时创建全文索引
(
id int(5) primary key not null auto_increment,
data text,
fulltext(data)
);

=========================================================================

一个简单的表
我们将在整个过程中使用下面的表.
CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;
如果你没有把默认的表类型设置成MyISAM以外的类型那么TYPE=MyISAM可以省略.建表之后,向其中填充一些数据,例如:
INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');

如果你已经建立好了一个表,你可以使用ALTER TABLE(就像CREATE INDEX语句一样)语句添加一个全文索引,例如:
ALTER TABLE fulltext_sample ADD FULLTEXT(copy)
查找文本
全文索引搜索的语法很简单,你只要MATCH字段,AGAINST你要查找的文本,例如:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('love');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+

在全文索引上进行搜索是不区分大小写的,因此下面的语句也可以正常运行:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('LOVE');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+

全文索引通常用来搜索自然语言文本,例如报纸文章,网页内容等等.因此MySQL为这类搜索添加了很多特性.MySQL不索引任何长度小于等于3的文本, 也不索引有50%机会出现的单词.这意味着如果你的表少于2条记录,基于全文索引的搜索不会返回任何东西.将来,MySQL会使这项功能更灵活,但是现在 它应该可以适合大部分自然语言的使用.如果你的数据库中的大部分记录都包含”music”,你很可能不希望返回这些记录,你可以使用IN BOOLEAN MODE修饰符来获得50%左右的阀值,见本文第二部分.
结果将按照关联性从高到底的顺序返回.


主要特性
下面是标准的全文索引搜索的主要特性:
1.排除重复词语
2.排除长度小于4的词语
3.排除在多于一半记录中出现的词语(就是说只要要有3条记录)
4.带连字符的词语被认为两个词语
5.结果按照关联度降序返回
6.忽略列表中的词语也被从搜索结果中排除.忽略列表基于普通的英文单词,因此如果你的数据用作不同的目的,你可能希望改变忽略列表.不幸的是,这样作并 不容易.你需要编辑文件myisam/ft_static.c,重新编辑MySQL,并重建索引!这里有一个忽略列表.注意,这些在不同的版本里有所更 改.
忽略列表
"a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst",
"an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully",
"b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can",
"can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course",
"currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely",
"especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth",
"four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello",
"help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate",
"in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l",
"last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover",
"most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing",
"novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own",
"p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively",
"respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she",
"should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's",
"take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon",
"these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly",
"try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants",
"was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever",
"whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're",
"you've", "your", "yours", "yourself", "yourselves", "z", "zero",

让我们看一下其中的一些词.如果你懒的输入,但是想查找”love”这个词,象下面这样:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('lov');
Empty set (0.00 sec)

什么都没返回,因为全文索引只包含完整的单词,不是部分单词.如果想得到返回,你必须把单词写完整,就像第一个例子里一样.
就像我们提过的,连字符单词在全文索引中被排除(它们被作为单独的单词索引),因此下面的语句什么都不返回:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('all-out');
Empty set (0.00 sec)

很不幸,两个单词都小于4个字符,因此单独搜索时也不会出现,而且通常的搜索中也不会出现.本文的第二部分中使用BOOLEAN MODE搜索可以搜索部分的或者包含连字符的单词.
你也可以一次搜索多个单词,用逗号分隔.下面的例子查找包含”here”和”appears”的记录:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('here','appears');
Empty set (0.01 sec)

出乎意料这个语句没有返回.但是仔细看看忽略列表,这个词被列在其中,因此被从索引中排除了.忽略列表可能是人们解释MySQL全文索引没有生效的通常原因.如果你的查询返回了一个结果,那么你的版本的MySQL的忽略列表不包含”here”这个词.
关联度
下面的例子说明记录返回的优先级
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+
| copy |
+---------------------------+
| A good alert |
| It appears good from here |
| An all-out alert |
+---------------------------+

记录”A good alert”首先出现,因为它同时包含要搜索的两个词.你不必相信我-只需要看看MySQL在结果中显示的优先级.简单的在字段列表中重复MATCH()函数,例如:
mysql> SELECT copy,MATCH(copy) AGAINST('good,alert') AS relevance
FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+------------------+
| copy | relevance |
+---------------------------+------------------+
| A good alert | 1.3551264824316 |
| An all-out alert | 0.68526663197496 |
| It appears good from hear | 0.67003110026735 |
+---------------------------+------------------+

关联度的计算非常复杂,它基于索引中单词的数量,记录中不同单词的个数,索引和返回结果中单词的总数,以及单词的重要程度.这个数字可能在你的MySQL版本中有所不同,MySQL偶尔会强化计算逻辑.
对大多数应用来说标准的全文索引搜索非常有用而充分,MySQL 4让它更加强大.

============================================================================Mysql的中文检索问题很多人觉得全文检索不支持中文,但其实是可以的。但你必须要解决MySQL的中文插入问题。建议大家在我的BLOG上查一篇关于MYSQL中文问题解决文章。

=============================================================================

注意:全文索引只支持MyISAM 格式的表,InnoDB不支持的。如果你的表中有外键那就没办法使用全文索引功能了。其实这也是为什么全文索引使用的不广泛的原因。那么只好使用模糊查询咯:下面这个模糊查询的范例也许你用的上:

SELECT * FROM `ad` WHERE CONCAT(`ad_content`,`ad_id`,`ad_flack`,`ad_prod`) LIKE '%110%'

这是一个单表多列的模糊查询方式。很适用哦!

----------------------------------------------------------------------------------------------------------------------------

另附转载全文索引can't find full text matching 报错的一片文章

关键字: mysql, mysql全文搜索, can't find fulltext index matching the colu

MySQL的全文搜索问题,如果你对表info的title,content(...)等字段设置了fulltext索引,则在match时,就必须match全部的索引字段,例如只match(title)则会报错: 

Java代码
  1. java.sql.SQLException: Can't find FULLTEXT index matching the column list  
java.sql.SQLException: Can't find FULLTEXT index matching the column list




搞了我一上午,总算是搞清楚了,有用到的可以做个借鉴。 


再补充一下: 

如果是中文搜索,就不要考虑了,命中率低的要命,还要另外考虑中文分词,词的长度..... 

只好用LIKE了。

---------------------------------------------------------------------------------------------------------------------

关于Mysql模糊查询的优化-全文检索和Like的使用 表A:
CREATE TABLE `tableA` (
`id` int(11) NOT NULL auto_increment,
`content` varchar(256) default NULL,
PRIMARY KEY (`id`),

如果我要在content中进行模糊查询,那么使用like的话,肯定要这样写:
select * from tableA from content like "%xxx%"

这里有个问题,即使我为content加了index索引,那么在下面这两种情况索引也是无效的
content like "%xxx" / like "%xxx%" 都不能使用索引

如果想索引作用只能使用content like "xxx%"

这时候就要使用全文索引来处理了
ALTER TABLE `tableA` ADD FULLTEXT `fidx_content` (`content`)

然后用以下方式查询
SELECT * FROM `tableA`WHERE MATCH (content)AGAINST ('xxx')

EXPLAIN SELECT * FROM `wf_master`WHERE MATCH (cpr_no)AGAINST ('801876')
id select_type table       type      possible_keys key         key_len ref rows Extra
1   SIMPLE       wf_master fulltext fidx_cpr_no     fidx_cpr_no 0       1          Using where

需要注意的是全文检索的对象是一个单词,被检索的词需要用非文本隔开的

也就是说如果你在"abcd,efg,hijklmn"中检索"hi",那么全文检索也没有用,如果你检索efg,那么可以使用全文检索

再有一点需要说明的是无论建立哪种索引,MYSQL内建函数: FIND_IN_SET, POSITION,LOCATE都不能使用索引