MySQL - 如何查找具有最相似开头的单词

时间:2022-12-06 19:14:21

How to find varchar-word that has the most similar beginning of the specified word in MySQL database?

如何在MySQL数据库中找到与指定单词最相似的varchar-word?

For example:

例如:

+-------------------+
|    word_column    | 
+-------------------+
| StackOferflow     |
| StackExchange     |
| MetaStackExchange |
|       ....        |
+-------------------+

query: call get_with_similar_begin('StackExch_bla_bla_bla');
output: 'StackExchange'

query:call get_with_similar_begin('StackExch_bla_bla_bla');输出:'StackExchange'

query: call get_with_similar_begin('StackO_bla_bla_bla');
output: 'StackOferflow'

query:call get_with_similar_begin('StackO_bla_bla_bla');输出:'StackOferflow'


UPDATE :

更新:

Select * from words where word_column like 'StackExch_bla_bla_bla' will not give the correct result, because 'StackExchange' does not match this filter.

从单词中选择*,其中像'StackExch_bla_bla_bla'这样的word_column将不会给出正确的结果,因为'StackExchange'与此过滤器不匹配。

Additional info: I has BTREE-index on word_column and I would like to use it whenever possible

附加信息:我在word_column上有BTREE索引,我想尽可能使用它

4 个解决方案

#1


2  

In SQL Server we can use CTE like below query to achieve what you want:

在SQL Server中,我们可以像下面的查询一样使用CTE来实现你想要的:

declare @search nvarchar(255) = 'StackExch_bla_bla_bla';

-- A cte that contains `StackExch_bla_bla_bla` sub-strings: {`StackExch_bla_bla_bla`, `StackExch_bla_bla_bl`, ...,  `S`}
with cte(part, lvl) as (  
    select @search, 1
    union all 
    select substring(@search, 1, len(@search) - lvl), lvl + 1
    from cte
    where lvl < len(@search)
), t as (   -- Now below cte will find match level of each word_column
    select t.word_column, min(cte.lvl) matchLvl
    from yourTable t
    left join cte
      on t.word_column like cte.part+'%'
    group by t.word_column
)
select top(1) word_column
from t
where matchLvl is not null   -- remove non-matched rows
order by matchLvl;

SQL Server Fiddle Demo

SQL Server小提琴演示

I need more time to find a way in MySQL for it, Hope some MySQL experts answer faster ;).

我需要更多时间在MySQL中找到一种方法,希望一些MySQL专家能够更快地回答;)。

My best try in MySQL is this:

我在MySQL中的最佳尝试是:

select tt.word_column
from (
  select t.word_column, min(lvl) matchLvl
  from yourTable t
  join (
    select 'StackExch_bla_bla_bla' part, 1 lvl
    union all select 'StackExch_bla_bla_bl', 2
    union all select 'StackExch_bla_bla_b', 3
    union all select 'StackExch_bla_bla_', 4
    union all select 'StackExch_bla_bla', 5
    union all select 'StackExch_bla_bl', 6
    union all select 'StackExch_bla_b', 7
    union all select 'StackExch_bla_', 8
    union all select 'StackExch_bla', 9
    union all select 'StackExch_bl', 10
    union all select 'StackExch_b', 11
    union all select 'StackExch_', 12
    union all select 'StackExch', 13
    union all select 'StackExc', 14
    union all select 'StackEx', 15
    union all select 'StackE', 16
    union all select 'Stack', 17
    union all select 'Stac', 18
    union all select 'Sta', 19
    union all select 'St', 20
    union all select 'S', 21
  ) p on t.word_column like concat(p.part, '%')
  group by t.word_column
  ) tt
order by matchLvl
limit 1;

I think by creating a stored procedure and using a temp table to store values in p sub-select you can achieve what you want -HTH ;).

我认为通过创建存储过程并使用临时表在p子选择中存储值,您可以实现您想要的--HTH;)。

MySQL Fiddle Demo

MySQL小提琴演示

#2


2  

This is a slight variation on @shA.t's answer. The aggregation is not necessary:

@ shA.t的回答略有不同。聚合不是必需的:

select t.*, p.lvl
from yourTable t join
     (select 'StackExch_bla_bla_bla' as part, 1 as lvl union all
      select 'StackExch_bla_bla_bl', 2 union all
      select 'StackExch_bla_bla_b', 3 union all
      select 'StackExch_bla_bla_', 4 union all
      select 'StackExch_bla_bla', 5 union all
      select 'StackExch_bla_bl', 6 union all
      select 'StackExch_bla_b', 7 union all
      select 'StackExch_bla_', 8 union all
      select 'StackExch_bla', 9 union all
      select 'StackExch_bl', 10 union all
      select 'StackExch_b', 11 union all
      select 'StackExch_', 12 union all
      select 'StackExch', 13 union all
      select 'StackExc', 14 union all
      select 'StackEx', 15 union all
      select 'StackE', 16 union all
      select 'Stack', 17 union all
      select 'Stac', 18 union all
      select 'Sta', 19 union all
      select 'St', 20 union all
      select 'S', 21
     ) p
     on t.word_column like concat(p.part, '%')
order by matchLvl
limit 1;

A faster way is to use case:

更快的方法是使用案例:

select t.*,
       (case when t.word_column like concat('StackExch_bla_bla_bla', '%') then 'StackExch_bla_bla_bla'
             when t.word_column like concat('StackExch_bla_bla_bl', '%') then 'StackExch_bla_bla_bl'
             when t.word_column like concat('StackExch_bla_bla_b', '%') then 'StackExch_bla_bla_b'
             . . .
             when t.word_column like concat('S', '%') then 'S'
             else ''
        end) as longest_match
from t
order by length(longest_match) desc
limit 1;

Neither of these will make effective use of the index.

这些都不能有效地利用该指数。

If you want a version that uses the index, then do the looping at the application layer, and repeated run the query as:

如果您需要使用索引的版本,请在应用程序层执行循环,然后重复运行查询:

select t.*
from t
where t.word_column like 'StackExch_bla_bla_bla%'
limit 1;

Then stop when you hit the first match. MySQL should be using the index for the like comparison.

然后在第一场比赛时停止。 MySQL应该使用索引进行类似的比较。

You can come pretty close to this using a union all:

你可以使用union all来接近这个:

(select t.*, 'StackExch_bla_bla_bla' as matching
 from t
 where t.word_column like 'StackExch_bla_bla_bla%'
 limit 1
) union all
(select t.*, 'StackExch_bla_bla_bl'
 from t
 where t.word_column like 'StackExch_bla_bla_bl%'
 limit 1
) union all
(select t.*, 'StackExch_bla_bla_b'
 from t
 where t.word_column like 'StackExch_bla_bla_b%'
 limit 1
) union al
. . .
(select t.*, 'S'
 from t
 where t.word_column like 'S%'
 limit 1
)
order by length(matching) desc
limit 1;

#3


2  

Create table/insert data.

创建表/插入数据。

CREATE DATABASE IF NOT EXISTS *;
USE *;

DROP TABLE IF EXISTS word;
CREATE TABLE IF NOT EXISTS word(
      word_column VARCHAR(255)
    , KEY(word_column)
)
;

INSERT INTO word
    (`word_column`)
VALUES
    ('*'),
    ('StackExchange'),
    ('MetaStackExchange')
;

This solution depends on generating a large number list. We can do that with this query. This query generates numbers from 1 to 1000. I do this so this query will support searches up to 1000 chars.

此解决方案取决于生成大量列表。我们可以使用此查询执行此操作。此查询生成从1到1000的数字。我这样做,因此此查询将支持最多1000个字符的搜索。

Query

询问

SELECT 
 @row := @row + 1 AS ROW
FROM (
  SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) 
 row1
CROSS JOIN (
  SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) row2
CROSS JOIN (
  SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) row3
CROSS JOIN (
  SELECT @row := 0
) AS init_user_param

result

结果

  row  
--------
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
     ...
     ...
     990
     991
     992
     993
     994
     995
     996
     997
     998
     999
    1000

Now we use the last query as delivered table in combination with DISTINCT SUBSTRING('StackExch_bla_bla_bla', 1, [number])to find a unique word list.

现在我们将最后一个查询作为已传递的表与DISTINCT SUBSTRING('StackExch_bla_bla_bla',1,[number])结合使用,以查找唯一的单词列表。

Query

询问

SELECT 
 DISTINCT  
   SUBSTRING('StackExch_bla_bla_bla', 1, rows.row) AS word
FROM (

  SELECT 
   @row := @row + 1 AS ROW
  FROM (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) 
   row1
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row2
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row3
  CROSS JOIN (
    SELECT @row := 0
  ) AS init_user_param
) ROWS

Result

结果

word                   
-----------------------
S                      
St                     
Sta                    
Stac                   
Stack                  
StackE                 
StackEx                
StackExc               
StackExch              
StackExch_             
StackExch_b            
StackExch_bl           
StackExch_bla          
StackExch_bla_         
StackExch_bla_b        
StackExch_bla_bl       
StackExch_bla_bla      
StackExch_bla_bla_     
StackExch_bla_bla_b    
StackExch_bla_bla_bl   
StackExch_bla_bla_bla  

Now want can join and use REPLACE(word_column, word, '') and CHAR_LENGTH(REPLACE(word_column, word, ''))to generate a list.

现在想要加入并使用REPLACE(word_column,word,'')和CHAR_LENGTH(REPLACE(word_column,word,''))来生成列表。

Query

询问

SELECT 
 *
 , REPLACE(word_column, word, '') AS replaced
 , CHAR_LENGTH(REPLACE(word_column, word, '')) chars_afterreplace
FROM (
 SELECT 
   DISTINCT  
     SUBSTRING('StackExch_bla_bla_bla', 1, rows.row_number) AS word
  FROM (

    SELECT 
     @row := @row + 1 AS row_number
    FROM (
      SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
    ) 
     row1
    CROSS JOIN (
      SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
    ) row2
    CROSS JOIN (
      SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
    ) row3
    CROSS JOIN (
      SELECT @row := 0
    ) AS init_user_param
  ) ROWS
) words
INNER JOIN
  word
ON
 word.word_column LIKE CONCAT(words.word, '%')

Result

结果

word        word_column    replaced       chars_afterreplace  
----------  -------------  -------------  --------------------
S           StackExchange  tackExchange                     12
S           *  tackOverflow                     12
St          StackExchange  ackExchange                      11
St          *  ackOverflow                      11
Sta         StackExchange  ckExchange                       10
Sta         *  ckOverflow                       10
Stac        StackExchange  kExchange                         9
Stac        *  kOverflow                         9
Stack       StackExchange  Exchange                          8
Stack       *  Overflow                          8
StackE      StackExchange  xchange                           7
StackEx     StackExchange  change                            6
StackExc    StackExchange  hange                             5
StackExch   StackExchange  ange                              4
StackExch_  StackExchange  StackExchange                    13

Now we can clearly see we want the word with the lowest chars_afterreplace. So we want to do ORDER BY CHAR_LENGTH(REPLACE(word_column, word, '')) ASC LIMIT 1

现在我们可以清楚地看到我们想要具有最低chars_afterreplace的单词。所以我们想做ORDER BY CHAR_LENGTH(REPLACE(word_column,word,''))ASC LIMIT 1

Query

询问

SELECT 
 word.word_column
FROM (
 SELECT 
   DISTINCT  
     SUBSTRING('StackExch_bla_bla_bla', 1, rows.row_number) AS word
FROM (

  SELECT 
    @row := @row + 1 AS row_number
  FROM (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) 
   row1
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row2
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row3
  CROSS JOIN (
    SELECT @row := 0
  ) AS init_user_param
) ROWS

) words
INNER JOIN word
ON word.word_column LIKE CONCAT(words.word, '%')
ORDER BY CHAR_LENGTH(REPLACE(word_column, word, '')) ASC
LIMIT 1

Results

结果

word_column    
---------------
StackExchange  

#4


0  

The following solutions need a table containing sequence numbers from 1 to (at least) the length of your word_column. Assuming the word_column is VARCHAR(190) you need a table with numbers from 1 to 190. If you use MariaDB with the sequence plugin, you can use the table seq_1_to_190. If you don't have it, there are many ways to create it. One simple way is to use the information_schema.columns table:

以下解决方案需要一个包含从word到(至少)word_column长度的序列号的表。假设word_column是VARCHAR(190),则需要一个数字从1到190的表。如果将MariaDB与序列插件一起使用,则可以使用表seq_1_to_190。如果您没有它,有很多方法可以创建它。一种简单的方法是使用information_schema.columns表:

create table if not exists seq_1_to_190 (seq tinyint unsigned auto_increment primary key)
    select null as seq from information_schema.columns limit 190;

You can also create it on-the-fly in a subquery, but that would complicate your queries.

您也可以在子查询中即时创建它,但这会使您的查询复杂化。

I will use the session variable @word to store the search string.

我将使用会话变量@word来存储搜索字符串。

set @word = 'StackExch_bla_bla_bla';

But you can replace all its occurrences with the constant search string.

但您可以使用常量搜索字符串替换所有出现的内容。

Now we can use the sequence table to create all prefix substrings with

现在我们可以使用序列表来创建所有前缀子串

select seq as l, left(@word, seq) as substr
from seq_1_to_190 s
where s.seq <= char_length(@word)

http://rextester.com/BWU18001

http://rextester.com/BWU18001

and use it for the LIKE condition when you join it with your words table:

当您将其与单词表连接时,将其用于LIKE条件:

select w.word_column
from (
    select seq as l, left(@word, seq) as substr
    from seq_1_to_190 s
    where s.seq <= char_length(@word)
) s
join words w on w.word_column like concat(replace(s.substr, '_', '\_'), '%')
order by s.l desc
limit 1

http://rextester.com/STQP82942

http://rextester.com/STQP82942

Note that _ is a placeholder and you need to escape it in your search string with \_. You also need to do that for % if your string can contain it, but I will skip this part in my answer.

请注意_是占位符,您需要使用\ __在搜索字符串中将其转义。如果您的字符串可以包含它,您还需要为%执行此操作,但我将在我的答案中跳过此部分。

The query can also be written without the subquery:

也可以在没有子查询的情况下编写查询:

select w.word_column
from seq_1_to_190 s
join words w on w.word_column like concat(replace(left(@word, seq), '_', '\_'), '%')
where s.seq <= char_length(@word)
order by s.seq desc
limit 1

http://rextester.com/QVZI59071

http://rextester.com/QVZI59071

These queries do the job and in theorie they should also be fast. But MySQL (In my case MariaDB 10.0.19) creates a bad execution plan and doesn't use the index for the ORDER BY clause. Both queries run in about 1.8 seconds on a 100K rows data set.

这些查询可以完成工作,理论上它们也应该很快。但是MySQL(在我的案例中是MariaDB 10.0.19)创建了一个糟糕的执行计划,并且没有使用ORDER BY子句的索引。两个查询在100K行数据集上运行大约1.8秒。

Best I could do to improve the performance with a single query is

我用单个查询来提高性能的最佳方法是

select (
    select word_column
    from words w
    where w.word_column like concat(replace(left(@word, s.seq), '_', '\_'), '%')
    limit 1
) as word_column
from seq_1_to_190 s
where s.seq <= char_length(@word)
having word_column is not null
order by s.seq desc
limit 1

http://rextester.com/APZHA8471

http://rextester.com/APZHA8471

This one is faster, but still needs like 670 msec. Note that Gordons CASE query runs in 125 msec, though it needs a full table/index scan and filesort.

这个更快,但仍需要670毫秒。请注意,Gordons CASE查询运行时间为125毫秒,但需要完整的表/索引扫描和文件排序。

However I managed to force the engine to use the index for the ORDER BY clause with an indexed temporary table:

但是我设法强制引擎使用带有索引临时表的ORDER BY子句的索引:

drop temporary table if exists tmp;
create temporary table tmp(
    id tinyint unsigned auto_increment primary key,
    pattern varchar(190)
) engine=memory
    select null as id, left(@word, seq) as pattern
    from seq_1_to_190 s
    where s.seq <= char_length(@word)
    order by s.seq desc;

select w.word_column
from tmp force index for order by (primary)
join words w 
    on  w.word_column >= tmp.pattern
    and w.word_column <  concat(tmp.pattern, char(127))
order by tmp.id asc
limit 1

http://rextester.com/OOE82089

http://rextester.com/OOE82089

This query is "instant" (less than 1 msec) on my 100K rows test table. If I remove FORCE INDEX or use a LIKE condition, it will be slow again.

此查询在我的100K行测试表上是“即时”(小于1毫秒)。如果我删除FORCE INDEX或使用LIKE条件,它将再次变慢。

Note that char(127) seems to work for ASCII strings. You might need to find another character according to your character set.

请注意,char(127)似乎适用于ASCII字符串。您可能需要根据您的字符集找到另一个字符。

After all that, I must say that my first thought was to use a UNION ALL query, which was also suggested by Gordon Linoff. However - here is a SQL only solution:

毕竟,我必须说我的第一个想法是使用UNION ALL查询,这也是Gordon Linoff建议的。但是 - 这是一个SQL唯一的解决方案:

set @subquery = '(
    select word_column
    from words
    where word_column like {pattern}
    limit 1
)';

set session group_concat_max_len = 1000000;
set @sql = (
    select group_concat(
        replace(
            @subquery,
            '{pattern}',
            replace(quote(concat(left(@word, seq), '%')), '_', '\_')
        )
        order by s.seq desc
        separator ' union all '
    )
    from seq_1_to_190 s
    where s.seq <= char_length(@word)
);
set @sql = concat(@sql, ' limit 1');

prepare stmt from @sql;
execute stmt;

http://rextester.com/OPTJ37873

http://rextester.com/OPTJ37873

It is also "instant".

它也是“即时的”。

If you like strored procedures/functions - Here's a function:

如果你喜欢strored的程序/函数 - 这是一个函数:

create function get_with_similar_begin(search_str text) returns text
begin
    declare l integer;
    declare res text;
    declare pattern text;

    set l = char_length(search_str);
    while l > 0 and res is null do
        set pattern = left(search_str, l);
        set pattern = replace(pattern, '_', '\_');
        set pattern = replace(pattern, '%', '\%');
        set pattern = concat(pattern, '%');
        set res = (select word_column from words where word_column like pattern);
        set l = l - 1;
    end while;
    return res;
end

Use it as

用它作为

select get_with_similar_begin('StackExch_bla_bla_bla');
select get_with_similar_begin('StackO_bla_bla_bla');

http://rextester.com/CJTU4629

http://rextester.com/CJTU4629

It is probably the fastest way. Though for long strings a kind of divide and conquer algorinthm might decrease the average number of lookups. But might also be just overkill.

这可能是最快的方式。虽然对于长串,一种分而治之的算法可能会减少平均查找次数。但也可能只是矫枉过正。

If you want to test your queries on a big table - I used the following code to create my test table (for MariaDB with sequence plugin):

如果你想在一个大表上测试你的查询 - 我使用下面的代码来创建我的测试表(对于带序列插件的MariaDB):

drop table if exists words;
create table words(
    id mediumint auto_increment primary key,
    word_column varchar(190),
    index(word_column)
);

insert into words(word_column)
    select concat('Stack', rand(1)) as word_column
    from seq_1_to_100000;

insert into words(word_column)values('StackOferflow'),('StackExchange'),('MetaStackExchange');

#1


2  

In SQL Server we can use CTE like below query to achieve what you want:

在SQL Server中,我们可以像下面的查询一样使用CTE来实现你想要的:

declare @search nvarchar(255) = 'StackExch_bla_bla_bla';

-- A cte that contains `StackExch_bla_bla_bla` sub-strings: {`StackExch_bla_bla_bla`, `StackExch_bla_bla_bl`, ...,  `S`}
with cte(part, lvl) as (  
    select @search, 1
    union all 
    select substring(@search, 1, len(@search) - lvl), lvl + 1
    from cte
    where lvl < len(@search)
), t as (   -- Now below cte will find match level of each word_column
    select t.word_column, min(cte.lvl) matchLvl
    from yourTable t
    left join cte
      on t.word_column like cte.part+'%'
    group by t.word_column
)
select top(1) word_column
from t
where matchLvl is not null   -- remove non-matched rows
order by matchLvl;

SQL Server Fiddle Demo

SQL Server小提琴演示

I need more time to find a way in MySQL for it, Hope some MySQL experts answer faster ;).

我需要更多时间在MySQL中找到一种方法,希望一些MySQL专家能够更快地回答;)。

My best try in MySQL is this:

我在MySQL中的最佳尝试是:

select tt.word_column
from (
  select t.word_column, min(lvl) matchLvl
  from yourTable t
  join (
    select 'StackExch_bla_bla_bla' part, 1 lvl
    union all select 'StackExch_bla_bla_bl', 2
    union all select 'StackExch_bla_bla_b', 3
    union all select 'StackExch_bla_bla_', 4
    union all select 'StackExch_bla_bla', 5
    union all select 'StackExch_bla_bl', 6
    union all select 'StackExch_bla_b', 7
    union all select 'StackExch_bla_', 8
    union all select 'StackExch_bla', 9
    union all select 'StackExch_bl', 10
    union all select 'StackExch_b', 11
    union all select 'StackExch_', 12
    union all select 'StackExch', 13
    union all select 'StackExc', 14
    union all select 'StackEx', 15
    union all select 'StackE', 16
    union all select 'Stack', 17
    union all select 'Stac', 18
    union all select 'Sta', 19
    union all select 'St', 20
    union all select 'S', 21
  ) p on t.word_column like concat(p.part, '%')
  group by t.word_column
  ) tt
order by matchLvl
limit 1;

I think by creating a stored procedure and using a temp table to store values in p sub-select you can achieve what you want -HTH ;).

我认为通过创建存储过程并使用临时表在p子选择中存储值,您可以实现您想要的--HTH;)。

MySQL Fiddle Demo

MySQL小提琴演示

#2


2  

This is a slight variation on @shA.t's answer. The aggregation is not necessary:

@ shA.t的回答略有不同。聚合不是必需的:

select t.*, p.lvl
from yourTable t join
     (select 'StackExch_bla_bla_bla' as part, 1 as lvl union all
      select 'StackExch_bla_bla_bl', 2 union all
      select 'StackExch_bla_bla_b', 3 union all
      select 'StackExch_bla_bla_', 4 union all
      select 'StackExch_bla_bla', 5 union all
      select 'StackExch_bla_bl', 6 union all
      select 'StackExch_bla_b', 7 union all
      select 'StackExch_bla_', 8 union all
      select 'StackExch_bla', 9 union all
      select 'StackExch_bl', 10 union all
      select 'StackExch_b', 11 union all
      select 'StackExch_', 12 union all
      select 'StackExch', 13 union all
      select 'StackExc', 14 union all
      select 'StackEx', 15 union all
      select 'StackE', 16 union all
      select 'Stack', 17 union all
      select 'Stac', 18 union all
      select 'Sta', 19 union all
      select 'St', 20 union all
      select 'S', 21
     ) p
     on t.word_column like concat(p.part, '%')
order by matchLvl
limit 1;

A faster way is to use case:

更快的方法是使用案例:

select t.*,
       (case when t.word_column like concat('StackExch_bla_bla_bla', '%') then 'StackExch_bla_bla_bla'
             when t.word_column like concat('StackExch_bla_bla_bl', '%') then 'StackExch_bla_bla_bl'
             when t.word_column like concat('StackExch_bla_bla_b', '%') then 'StackExch_bla_bla_b'
             . . .
             when t.word_column like concat('S', '%') then 'S'
             else ''
        end) as longest_match
from t
order by length(longest_match) desc
limit 1;

Neither of these will make effective use of the index.

这些都不能有效地利用该指数。

If you want a version that uses the index, then do the looping at the application layer, and repeated run the query as:

如果您需要使用索引的版本,请在应用程序层执行循环,然后重复运行查询:

select t.*
from t
where t.word_column like 'StackExch_bla_bla_bla%'
limit 1;

Then stop when you hit the first match. MySQL should be using the index for the like comparison.

然后在第一场比赛时停止。 MySQL应该使用索引进行类似的比较。

You can come pretty close to this using a union all:

你可以使用union all来接近这个:

(select t.*, 'StackExch_bla_bla_bla' as matching
 from t
 where t.word_column like 'StackExch_bla_bla_bla%'
 limit 1
) union all
(select t.*, 'StackExch_bla_bla_bl'
 from t
 where t.word_column like 'StackExch_bla_bla_bl%'
 limit 1
) union all
(select t.*, 'StackExch_bla_bla_b'
 from t
 where t.word_column like 'StackExch_bla_bla_b%'
 limit 1
) union al
. . .
(select t.*, 'S'
 from t
 where t.word_column like 'S%'
 limit 1
)
order by length(matching) desc
limit 1;

#3


2  

Create table/insert data.

创建表/插入数据。

CREATE DATABASE IF NOT EXISTS *;
USE *;

DROP TABLE IF EXISTS word;
CREATE TABLE IF NOT EXISTS word(
      word_column VARCHAR(255)
    , KEY(word_column)
)
;

INSERT INTO word
    (`word_column`)
VALUES
    ('*'),
    ('StackExchange'),
    ('MetaStackExchange')
;

This solution depends on generating a large number list. We can do that with this query. This query generates numbers from 1 to 1000. I do this so this query will support searches up to 1000 chars.

此解决方案取决于生成大量列表。我们可以使用此查询执行此操作。此查询生成从1到1000的数字。我这样做,因此此查询将支持最多1000个字符的搜索。

Query

询问

SELECT 
 @row := @row + 1 AS ROW
FROM (
  SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) 
 row1
CROSS JOIN (
  SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) row2
CROSS JOIN (
  SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) row3
CROSS JOIN (
  SELECT @row := 0
) AS init_user_param

result

结果

  row  
--------
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
     ...
     ...
     990
     991
     992
     993
     994
     995
     996
     997
     998
     999
    1000

Now we use the last query as delivered table in combination with DISTINCT SUBSTRING('StackExch_bla_bla_bla', 1, [number])to find a unique word list.

现在我们将最后一个查询作为已传递的表与DISTINCT SUBSTRING('StackExch_bla_bla_bla',1,[number])结合使用,以查找唯一的单词列表。

Query

询问

SELECT 
 DISTINCT  
   SUBSTRING('StackExch_bla_bla_bla', 1, rows.row) AS word
FROM (

  SELECT 
   @row := @row + 1 AS ROW
  FROM (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) 
   row1
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row2
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row3
  CROSS JOIN (
    SELECT @row := 0
  ) AS init_user_param
) ROWS

Result

结果

word                   
-----------------------
S                      
St                     
Sta                    
Stac                   
Stack                  
StackE                 
StackEx                
StackExc               
StackExch              
StackExch_             
StackExch_b            
StackExch_bl           
StackExch_bla          
StackExch_bla_         
StackExch_bla_b        
StackExch_bla_bl       
StackExch_bla_bla      
StackExch_bla_bla_     
StackExch_bla_bla_b    
StackExch_bla_bla_bl   
StackExch_bla_bla_bla  

Now want can join and use REPLACE(word_column, word, '') and CHAR_LENGTH(REPLACE(word_column, word, ''))to generate a list.

现在想要加入并使用REPLACE(word_column,word,'')和CHAR_LENGTH(REPLACE(word_column,word,''))来生成列表。

Query

询问

SELECT 
 *
 , REPLACE(word_column, word, '') AS replaced
 , CHAR_LENGTH(REPLACE(word_column, word, '')) chars_afterreplace
FROM (
 SELECT 
   DISTINCT  
     SUBSTRING('StackExch_bla_bla_bla', 1, rows.row_number) AS word
  FROM (

    SELECT 
     @row := @row + 1 AS row_number
    FROM (
      SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
    ) 
     row1
    CROSS JOIN (
      SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
    ) row2
    CROSS JOIN (
      SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
    ) row3
    CROSS JOIN (
      SELECT @row := 0
    ) AS init_user_param
  ) ROWS
) words
INNER JOIN
  word
ON
 word.word_column LIKE CONCAT(words.word, '%')

Result

结果

word        word_column    replaced       chars_afterreplace  
----------  -------------  -------------  --------------------
S           StackExchange  tackExchange                     12
S           *  tackOverflow                     12
St          StackExchange  ackExchange                      11
St          *  ackOverflow                      11
Sta         StackExchange  ckExchange                       10
Sta         *  ckOverflow                       10
Stac        StackExchange  kExchange                         9
Stac        *  kOverflow                         9
Stack       StackExchange  Exchange                          8
Stack       *  Overflow                          8
StackE      StackExchange  xchange                           7
StackEx     StackExchange  change                            6
StackExc    StackExchange  hange                             5
StackExch   StackExchange  ange                              4
StackExch_  StackExchange  StackExchange                    13

Now we can clearly see we want the word with the lowest chars_afterreplace. So we want to do ORDER BY CHAR_LENGTH(REPLACE(word_column, word, '')) ASC LIMIT 1

现在我们可以清楚地看到我们想要具有最低chars_afterreplace的单词。所以我们想做ORDER BY CHAR_LENGTH(REPLACE(word_column,word,''))ASC LIMIT 1

Query

询问

SELECT 
 word.word_column
FROM (
 SELECT 
   DISTINCT  
     SUBSTRING('StackExch_bla_bla_bla', 1, rows.row_number) AS word
FROM (

  SELECT 
    @row := @row + 1 AS row_number
  FROM (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) 
   row1
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row2
  CROSS JOIN (
    SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
  ) row3
  CROSS JOIN (
    SELECT @row := 0
  ) AS init_user_param
) ROWS

) words
INNER JOIN word
ON word.word_column LIKE CONCAT(words.word, '%')
ORDER BY CHAR_LENGTH(REPLACE(word_column, word, '')) ASC
LIMIT 1

Results

结果

word_column    
---------------
StackExchange  

#4


0  

The following solutions need a table containing sequence numbers from 1 to (at least) the length of your word_column. Assuming the word_column is VARCHAR(190) you need a table with numbers from 1 to 190. If you use MariaDB with the sequence plugin, you can use the table seq_1_to_190. If you don't have it, there are many ways to create it. One simple way is to use the information_schema.columns table:

以下解决方案需要一个包含从word到(至少)word_column长度的序列号的表。假设word_column是VARCHAR(190),则需要一个数字从1到190的表。如果将MariaDB与序列插件一起使用,则可以使用表seq_1_to_190。如果您没有它,有很多方法可以创建它。一种简单的方法是使用information_schema.columns表:

create table if not exists seq_1_to_190 (seq tinyint unsigned auto_increment primary key)
    select null as seq from information_schema.columns limit 190;

You can also create it on-the-fly in a subquery, but that would complicate your queries.

您也可以在子查询中即时创建它,但这会使您的查询复杂化。

I will use the session variable @word to store the search string.

我将使用会话变量@word来存储搜索字符串。

set @word = 'StackExch_bla_bla_bla';

But you can replace all its occurrences with the constant search string.

但您可以使用常量搜索字符串替换所有出现的内容。

Now we can use the sequence table to create all prefix substrings with

现在我们可以使用序列表来创建所有前缀子串

select seq as l, left(@word, seq) as substr
from seq_1_to_190 s
where s.seq <= char_length(@word)

http://rextester.com/BWU18001

http://rextester.com/BWU18001

and use it for the LIKE condition when you join it with your words table:

当您将其与单词表连接时,将其用于LIKE条件:

select w.word_column
from (
    select seq as l, left(@word, seq) as substr
    from seq_1_to_190 s
    where s.seq <= char_length(@word)
) s
join words w on w.word_column like concat(replace(s.substr, '_', '\_'), '%')
order by s.l desc
limit 1

http://rextester.com/STQP82942

http://rextester.com/STQP82942

Note that _ is a placeholder and you need to escape it in your search string with \_. You also need to do that for % if your string can contain it, but I will skip this part in my answer.

请注意_是占位符,您需要使用\ __在搜索字符串中将其转义。如果您的字符串可以包含它,您还需要为%执行此操作,但我将在我的答案中跳过此部分。

The query can also be written without the subquery:

也可以在没有子查询的情况下编写查询:

select w.word_column
from seq_1_to_190 s
join words w on w.word_column like concat(replace(left(@word, seq), '_', '\_'), '%')
where s.seq <= char_length(@word)
order by s.seq desc
limit 1

http://rextester.com/QVZI59071

http://rextester.com/QVZI59071

These queries do the job and in theorie they should also be fast. But MySQL (In my case MariaDB 10.0.19) creates a bad execution plan and doesn't use the index for the ORDER BY clause. Both queries run in about 1.8 seconds on a 100K rows data set.

这些查询可以完成工作,理论上它们也应该很快。但是MySQL(在我的案例中是MariaDB 10.0.19)创建了一个糟糕的执行计划,并且没有使用ORDER BY子句的索引。两个查询在100K行数据集上运行大约1.8秒。

Best I could do to improve the performance with a single query is

我用单个查询来提高性能的最佳方法是

select (
    select word_column
    from words w
    where w.word_column like concat(replace(left(@word, s.seq), '_', '\_'), '%')
    limit 1
) as word_column
from seq_1_to_190 s
where s.seq <= char_length(@word)
having word_column is not null
order by s.seq desc
limit 1

http://rextester.com/APZHA8471

http://rextester.com/APZHA8471

This one is faster, but still needs like 670 msec. Note that Gordons CASE query runs in 125 msec, though it needs a full table/index scan and filesort.

这个更快,但仍需要670毫秒。请注意,Gordons CASE查询运行时间为125毫秒,但需要完整的表/索引扫描和文件排序。

However I managed to force the engine to use the index for the ORDER BY clause with an indexed temporary table:

但是我设法强制引擎使用带有索引临时表的ORDER BY子句的索引:

drop temporary table if exists tmp;
create temporary table tmp(
    id tinyint unsigned auto_increment primary key,
    pattern varchar(190)
) engine=memory
    select null as id, left(@word, seq) as pattern
    from seq_1_to_190 s
    where s.seq <= char_length(@word)
    order by s.seq desc;

select w.word_column
from tmp force index for order by (primary)
join words w 
    on  w.word_column >= tmp.pattern
    and w.word_column <  concat(tmp.pattern, char(127))
order by tmp.id asc
limit 1

http://rextester.com/OOE82089

http://rextester.com/OOE82089

This query is "instant" (less than 1 msec) on my 100K rows test table. If I remove FORCE INDEX or use a LIKE condition, it will be slow again.

此查询在我的100K行测试表上是“即时”(小于1毫秒)。如果我删除FORCE INDEX或使用LIKE条件,它将再次变慢。

Note that char(127) seems to work for ASCII strings. You might need to find another character according to your character set.

请注意,char(127)似乎适用于ASCII字符串。您可能需要根据您的字符集找到另一个字符。

After all that, I must say that my first thought was to use a UNION ALL query, which was also suggested by Gordon Linoff. However - here is a SQL only solution:

毕竟,我必须说我的第一个想法是使用UNION ALL查询,这也是Gordon Linoff建议的。但是 - 这是一个SQL唯一的解决方案:

set @subquery = '(
    select word_column
    from words
    where word_column like {pattern}
    limit 1
)';

set session group_concat_max_len = 1000000;
set @sql = (
    select group_concat(
        replace(
            @subquery,
            '{pattern}',
            replace(quote(concat(left(@word, seq), '%')), '_', '\_')
        )
        order by s.seq desc
        separator ' union all '
    )
    from seq_1_to_190 s
    where s.seq <= char_length(@word)
);
set @sql = concat(@sql, ' limit 1');

prepare stmt from @sql;
execute stmt;

http://rextester.com/OPTJ37873

http://rextester.com/OPTJ37873

It is also "instant".

它也是“即时的”。

If you like strored procedures/functions - Here's a function:

如果你喜欢strored的程序/函数 - 这是一个函数:

create function get_with_similar_begin(search_str text) returns text
begin
    declare l integer;
    declare res text;
    declare pattern text;

    set l = char_length(search_str);
    while l > 0 and res is null do
        set pattern = left(search_str, l);
        set pattern = replace(pattern, '_', '\_');
        set pattern = replace(pattern, '%', '\%');
        set pattern = concat(pattern, '%');
        set res = (select word_column from words where word_column like pattern);
        set l = l - 1;
    end while;
    return res;
end

Use it as

用它作为

select get_with_similar_begin('StackExch_bla_bla_bla');
select get_with_similar_begin('StackO_bla_bla_bla');

http://rextester.com/CJTU4629

http://rextester.com/CJTU4629

It is probably the fastest way. Though for long strings a kind of divide and conquer algorinthm might decrease the average number of lookups. But might also be just overkill.

这可能是最快的方式。虽然对于长串,一种分而治之的算法可能会减少平均查找次数。但也可能只是矫枉过正。

If you want to test your queries on a big table - I used the following code to create my test table (for MariaDB with sequence plugin):

如果你想在一个大表上测试你的查询 - 我使用下面的代码来创建我的测试表(对于带序列插件的MariaDB):

drop table if exists words;
create table words(
    id mediumint auto_increment primary key,
    word_column varchar(190),
    index(word_column)
);

insert into words(word_column)
    select concat('Stack', rand(1)) as word_column
    from seq_1_to_100000;

insert into words(word_column)values('StackOferflow'),('StackExchange'),('MetaStackExchange');