当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比
一、对比情况说明:
1、数据量100w条数据
2、执行sql
二、对比结果
1
2
3
4
5
6
7
8
9
10
11
12
|
explain analyze SELECT
c_patent,
c_applyissno,
d_applyissdate,
d_applydate,
c_patenttype_dimn,
c_newlawstatus,
c_abstract
FROM
public .t_knowl_patent_zlxx_temp
WHERE
c_applicant LIKE '%本溪满族自治县连山关镇安平安养殖场%' ;
|
1、未建索时执行计划:
1
2
3
4
5
6
7
8
|
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3)
Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
Rows Removed by Filter: 333333
Planning time: 0.272 ms
Execution time: 228.116 ms"
|
2、btree索引
建索引语句
1
|
CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public .t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);
|
执行计划
1
2
3
4
5
6
7
8
|
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3)
Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
Rows Removed by Filter: 333333
Planning time: 0.116 ms
Execution time: 218.189 ms"
|
但是如果将查询sql稍微改动一下,把like查询中的前置%去掉是这样的
1
2
3
4
5
|
Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows =92 width=1278) (actual time =0.292..0.292 rows =0 loops=1)
Index Cond: (((c_applicant)::text ~>=~ '本溪满族自治县连山关镇安平安养殖场' ::text) AND ((c_applicant)::text ~<~ '本溪满族自治县连山关镇安平安养殖圻' ::text))
Filter: ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖场%' ::text)
Planning time : 0.710 ms
Execution time : 0.378 ms
|
3、gin索引
创建索引语句(postgresql要求在9.6版本及以上)
1
2
|
create extension pg_trgm;
CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public .t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);
|
执行计划
1
2
3
4
5
6
|
Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows =91 width=1268) (actual time =0.649..0.649 rows =0 loops=1)
Recheck Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%' ::text)
-> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows =91 width=0) (actual time =0.647..0.647 rows =0 loops=1)
Index Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%' ::text)
Planning time : 0.673 ms
Execution time : 0.740 ms
|
三、结论
btree索引可以让后置% "abc%"的模糊匹配走索引,gin + gp_trgm可以让前后置% "%abc%" 走索引。但是gin 索引也有弊端,以下情况可能导致无法命中:
搜索字段少于3个字符时,不会命中索引,这是gin自身机制导致。
当搜索字段过长时,比如email检索,可能也不会命中索引,造成原因暂时未知。
补充:PostgreSQL LIKE 查询效率提升实验
一、未做索引的查询效率
作为对比,先对未索引的查询做测试
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on gallery_map (cost=0.00..7002.32 rows =1025 width=621) (actual time =0.011..39.753 rows =1031 loops=1)
Filter: ((author)::text = '曹志耘' ::text)
Rows Removed by Filter: 71315
Planning time : 0.194 ms
Execution time : 39.879 ms
(5 rows )
Time : 40.599 ms
EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on gallery_map (cost=0.00..7002.32 rows =1025 width=621) (actual time =0.017..41.513 rows =1031 loops=1)
Filter: ((author)::text ~~ '曹志耘' ::text)
Rows Removed by Filter: 71315
Planning time : 0.188 ms
Execution time : 41.669 ms
(5 rows )
Time : 42.457 ms
EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on gallery_map (cost=0.00..7002.32 rows =1028 width=621) (actual time =0.017..41.492 rows =1031 loops=1)
Filter: ((author)::text ~~ '曹志耘%' ::text)
Rows Removed by Filter: 71315
Planning time : 0.307 ms
Execution time : 41.633 ms
(5 rows )
Time : 42.676 ms
|
很显然都会做全表扫描
二、创建btree索引
PostgreSQL默认索引是btree
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
CREATE INDEX ix_gallery_map_author ON gallery_map (author);
EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows =1025 width=621) (actual time =0.457..1.312 rows =1031 loops=1)
Recheck Cond: ((author)::text = '曹志耘' ::text)
Heap Blocks: exact=438
-> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows =1025 width=0) (actual time =0.358..0.358 rows =1031 loops=1)
Index Cond: ((author)::text = '曹志耘' ::text)
Planning time : 0.416 ms
Execution time : 1.422 ms
(7 rows )
Time : 2.462 ms
EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows =1025 width=621) (actual time =0.752..2.119 rows =1031 loops=1)
Filter: ((author)::text ~~ '曹志耘' ::text)
Heap Blocks: exact=438
-> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows =1025 width=0) (actual time =0.560..0.560 rows =1031 loops=1)
Index Cond: ((author)::text = '曹志耘' ::text)
Planning time : 0.270 ms
Execution time : 2.295 ms
(7 rows )
Time : 3.444 ms
EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on gallery_map (cost=0.00..7002.32 rows =1028 width=621) (actual time =0.015..41.389 rows =1031 loops=1)
Filter: ((author)::text ~~ '曹志耘%' ::text)
Rows Removed by Filter: 71315
Planning time : 0.260 ms
Execution time : 41.518 ms
(5 rows )
Time : 42.430 ms
EXPLAIN ANALYZE select * from gallery_map where author like '%研究室' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on gallery_map (cost=0.00..7002.32 rows =2282 width=621) (actual time =0.064..52.824 rows =2152 loops=1)
Filter: ((author)::text ~~ '%研究室' ::text)
Rows Removed by Filter: 70194
Planning time : 0.254 ms
Execution time : 53.064 ms
(5 rows )
Time : 53.954 ms
|
可以看到,等于、like的全匹配是用到索引的,like的模糊查询还是全表扫描
三、创建gin索引
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
CREATE EXTENSION pg_trgm;
CREATE INDEX ix_gallery_map_author ON gallery_map USING gin (author gin_trgm_ops);
EXPLAIN ANALYZE select * from gallery_map where author like '曹%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on gallery_map (cost=19.96..2705.69 rows =1028 width=621) (actual time =0.419..1.771 rows =1031 loops=1)
Recheck Cond: ((author)::text ~~ '曹%' ::text)
Heap Blocks: exact=438
-> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..19.71 rows =1028 width=0) (actual time =0.312..0.312 rows =1031 loops=1)
Index Cond: ((author)::text ~~ '曹%' ::text)
Planning time : 0.358 ms
Execution time : 1.916 ms
(7 rows )
Time : 2.843 ms
EXPLAIN ANALYZE select * from gallery_map where author like '%耘%' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on gallery_map (cost=0.00..7002.32 rows =1028 width=621) (actual time =0.015..51.641 rows =1031 loops=1)
Filter: ((author)::text ~~ '%耘%' ::text)
Rows Removed by Filter: 71315
Planning time : 0.268 ms
Execution time : 51.957 ms
(5 rows )
Time : 52.899 ms
EXPLAIN ANALYZE select * from gallery_map where author like '%研究室%' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on gallery_map (cost=31.83..4788.42 rows =2559 width=621) (actual time =0.914..4.195 rows =2402 loops=1)
Recheck Cond: ((author)::text ~~ '%研究室%' ::text)
Heap Blocks: exact=868
-> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..31.19 rows =2559 width=0) (actual time =0.694..0.694 rows =2402 loops=1)
Index Cond: ((author)::text ~~ '%研究室%' ::text)
Planning time : 0.306 ms
Execution time : 4.403 ms
(7 rows )
Time : 5.227 ms
|
gin_trgm索引的效果好多了
由于pg_trgm的索引是把字符串切成多个3元组,然后使用这些3元组做匹配,所以gin_trgm索引对于少于3个字符(包括汉字)的查询,只有前缀匹配会走索引
另外,还测试了btree_gin,效果和btree一样
注意:
gin_trgm要求数据库必须使用UTF-8编码
1
2
3
4
5
|
demo_v1 # \l demo_v1
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
---------+-----------+----------+-------------+-------------+-------------------
demo_v1 | wmpp_user | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
|
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://blog.csdn.net/qq_23986087/article/details/104021214