比较全文索引和传统的模糊查询的性能

--查看表数据的内容和分布
SQL> select count(*) from members;
COUNT(*)
----------
345567

SQL> CREATE INDEX ct_ind ON members(member_name) INDEXTYPE IS CTXSYS.CONTEXT;
索引已创建。

实验1:全文索引性能差于模糊查询
SQL> set autotrace traceonly
SQL> SELECT * FROM members WHERE CONTAINS(member_name, '亚') > 0;
已选择2629行。

执行计划
----------------------------------------------------------
Plan hash value: 420797271
---------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |       | 790 | 47400 | 289 (0)| 00:00:04 |
| 1 |  TABLE ACCESS BY INDEX ROWID| MEMBERS | 790 | 47400 | 289 (0)| 00:00:04 |
|*  2 | DOMAIN INDEX             | CT_IND  |    |    | 139 (0)| 00:00:02 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CTXSYS"."CONTAINS"("MEMBER_NAME",'亚')>0)

统计信息
----------------------------------------------------------
      15  recursive calls
      0  db block gets
   3904  consistent gets
      0  physical reads
      0  redo size
   206863  bytes sent via SQL*Net to client
   2445  bytes received via SQL*Net from client
      177  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
   2629  rows processed

SQL> select * from members where member_name like '%亚%';
已选择2629行。

执行计划
----------------------------------------------------------
Plan hash value: 3682585452
-----------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |       | 17252 |  1010K| 877 (1)| 00:00:11 |
|*  1 |  TABLE ACCESS FULL| MEMBERS | 17252 |  1010K| 877 (1)| 00:00:11 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("MEMBER_NAME" LIKE '%亚%' AND "MEMBER_NAME" IS NOT NULL)

统计信息
----------------------------------------------------------
      1  recursive calls
      0  db block gets
   3329  consistent gets
      0  physical reads
      0  redo size
   182298  bytes sent via SQL*Net to client
   2445  bytes received via SQL*Net from client
      177  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
   2629  rows processed

结论：
全文索引逻辑读：3904
模糊查询逻辑读：3329
经过测试可以看到在查询结果集比较多时，全文索引的性能优势并不明显，甚至比模糊查询的全表扫描还要低。

实验二：全文索引性能优于模糊查询的情况
--同样的SQL只改变查询条件
SQL> SELECT * FROM members WHERE CONTAINS(member_name, '亚南')>0;
已选择15行。

执行计划
----------------------------------------------------------
Plan hash value: 420797271
---------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |       | 20 |  1200 | 16 (0)| 00:00:01 |
| 1 |  TABLE ACCESS BY INDEX ROWID| MEMBERS | 20 |  1200 | 16 (0)| 00:00:01 |
|*  2 | DOMAIN INDEX             | CT_IND  |    |    | 12 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CTXSYS"."CONTAINS"("MEMBER_NAME",'亚南')>0)

统计信息
----------------------------------------------------------
      144  recursive calls
      0  db block gets
   1172  consistent gets
      0  physical reads
      0  redo size
   2075  bytes sent via SQL*Net to client
      520  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      15  rows processed

SQL> select * from members where member_name like '%亚南%';
已选择15行。

执行计划
----------------------------------------------------------
Plan hash value: 3682585452
-----------------------------------------------------------------------------
| Id  | Operation       | Name | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |       | 17252 |  1010K| 877 (1)| 00:00:11 |
|*  1 |  TABLE ACCESS FULL| MEMBERS | 17252 |  1010K| 877 (1)| 00:00:11 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("MEMBER_NAME" LIKE '%亚南%' AND "MEMBER_NAME" IS NOT NULL)

统计信息
----------------------------------------------------------
      1  recursive calls
      0  db block gets
   3155  consistent gets
      0  physical reads
      0  redo size
   1965  bytes sent via SQL*Net to client
      520  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      15  rows processed

结论：
全文索引逻辑读：1172
模糊查询逻辑读：3155
经过测试可以看到在查询结果集比较少时，全文索引的性能优于传统的全表扫描的模糊查询。

总结：
1.全文索引在查询结果比较多和跟其他索引的配合使用时，性能并不会一定会高，而且结果不一定会完全正确，这取决于使用何种分词方法。
2.由于全文索引是以空间换时间的思想，创建时占用大量的存储空间，适用于在长度比较小的字段上创建，减少分词量占用的空间，如name,title类似的字段。
3由于全文索引维护成本比较高，需要定期进行同步刷新索引数据，频繁刷新数据会对生产环境带来压力，这点物化视图有点类似，所以同步频率根据数据量和需求进行权衡，在OLTP生产环境要慎用。
4.由于全文索引bug较多，使用前一定做好测试，确保对其它表和其它应用环境不生产影响。笔者就遇到过创建全文索引导致logic dataguard同步失效的情况。

秒客网

比较全文索引和传统的模糊查询的性能

相关文章