假设我在数据库中有400行人名。搜索他们名字的最佳方法是什么?

时间:2022-09-15 20:08:25

They will also search part of their name. Not only words with spaces. If they type "Matt", I expect to retrieve "Matthew" too.

他们还会搜索他们名字的一部分。不仅是带空格的单词。如果他们输入“马特”,我也希望找回“马修”。

4 个解决方案

#1


12  

SELECT * 
FROM mytable 
WHERE name LIKE 'matt%' OR name LIKE '[ ,-/]matt%'

Notes:
1) Fancy wildcard. The reason for not using the simpler LIKE '%xyz%' form is that depending on the xyz the database could return many non-relevant records. For example "Jeff Zermatt" in the case of the "Matt" search.
The brackets in the second wildcard key include all the delimiters which may be indicative of a break between words. An alternative wildcard pattern would be [^A-Z0-9] (Which may yield a few O'Brian when search for brian but maybe not a bad thing...)

注意:1)花式通配符。不使用更简单的LIKE'%xyz%'形式的原因是,根据xyz,数据库可以返回许多不相关的记录。例如,在“Matt”搜索的情况下,“Jeff Zermatt”。第二个通配符键中的括号包括所有分隔符,这些分隔符可以指示单词之间的中断。另一种通配符模式是[^ A-Z0-9](搜索布里安时可能会产生一些奥布莱恩,但也许不是坏事...)

2) Performance. Because there are so few records in this table, the front wildcard approach is quite feasible, and certainly the easiest approach. No reason to search any further!
If the records happen to be very wide (many fields some of them more than 30 chars in length), you can create an index on name. The front-end wildcard will still require a scan, but this will be on the index which is narrower, hence fits more readily in the cache etc.
Indeed if rather than a SELECT * this query targets only a few of the fields of the myTable table [and if this table's record are "wide"], you can create a index made of all these fields.
Would the number of records grow past, say, 50,000 (and, to a lesser degree, would the application "hit" the database with similar queries at a rate above say 40 per minute), you may consider introducing more efficient ways of dealing with keywords: Full Text Catalog or a "hand made" table with the individual keywords.

2)表现。因为此表中的记录很少,所以前面的通配符方法非常可行,当然也是最简单的方法。没有理由再搜索了!如果记录非常广泛(许多字段中的一些字段长度超过30个字符),则可以在名称上创建索引。前端通配符仍然需要扫描,但这将在索引上更窄,因此更容易适应缓存等。事实上,如果不是SELECT *,这个查询只针对myTable的几个字段table [如果此表的记录是“wide”],您可以创建由所有这些字段组成的索引。记录数量是否超过50,000(并且,在较小程度上,应用程序会以高于每分钟40次的速率使用类似查询“点击”数据库),您可以考虑引入更有效的处理方式关键字:全文目录或带有各个关键字的“手工制作”表。

3) Advantages of another approach. The advantage of a solution whereby the application maintains a table with a list of the individual keywords, readily parsed, from the full name, doesn't only provide better scaling (when the table and/or usage grows), but also introduces improvements in the quality of the search.
For example, it may allow improving the effective recall by introducing common common nicknames of first names (Bill or Will or Billy for William, Dick for Richard, Jack or Johnny for John etc.). Another possibility open by a more sophisticated approach is the introduction of a Soundex or modified Soundex encoding of the name tokens, allowing the users to locates names even when they may mispell or ignore the precise spelling (eg. Wilmson vs. Wilmsen vs. Willmsonn etc.)

3)另一种方法的优点。应用程序维护一个表,其中包含从全名中轻松解析的各个关键字列表的解决方案的优点,不仅提供了更好的扩展(当表和/或使用增长时),而且还引入了改进搜索的质量。例如,它可以通过引入常用的名字昵称来改善有效回忆(Bill或Will或Billy代表William,Dick代表Richard,Jack或Johnny代表John等)。通过更复杂的方法开辟的另一种可能性是引入Soundex或修改名称标记的Soundex编码,允许用户定位名称,即使它们可能错误拼写或忽略精确拼写(例如Wilmson vs. Wilmsen vs. Willmsonn等。)

#2


10  

You can use:

您可以使用:

SELECT * 
 FROM mytable 
WHERE name LIKE '%matt%'

#3


1  

You have the following options:

您有以下选择:

  1. Full Text Search (FTS)
  2. 全文检索(FTS)
  3. Regular Expressions
  4. 常用表达
  5. LIKE Using wildcards
  6. 喜欢使用通配符

...in that order of preference.

......按照优先顺序排列。

#4


0  

If you are trying to search for the names through any development Language, you can use the Regular expression package in Java. Some thing like java.util.regex.*;

如果您尝试通过任何开发语言搜索名称,则可以使用Java中的正则表达式包。像java.util.regex。*;

#1


12  

SELECT * 
FROM mytable 
WHERE name LIKE 'matt%' OR name LIKE '[ ,-/]matt%'

Notes:
1) Fancy wildcard. The reason for not using the simpler LIKE '%xyz%' form is that depending on the xyz the database could return many non-relevant records. For example "Jeff Zermatt" in the case of the "Matt" search.
The brackets in the second wildcard key include all the delimiters which may be indicative of a break between words. An alternative wildcard pattern would be [^A-Z0-9] (Which may yield a few O'Brian when search for brian but maybe not a bad thing...)

注意:1)花式通配符。不使用更简单的LIKE'%xyz%'形式的原因是,根据xyz,数据库可以返回许多不相关的记录。例如,在“Matt”搜索的情况下,“Jeff Zermatt”。第二个通配符键中的括号包括所有分隔符,这些分隔符可以指示单词之间的中断。另一种通配符模式是[^ A-Z0-9](搜索布里安时可能会产生一些奥布莱恩,但也许不是坏事...)

2) Performance. Because there are so few records in this table, the front wildcard approach is quite feasible, and certainly the easiest approach. No reason to search any further!
If the records happen to be very wide (many fields some of them more than 30 chars in length), you can create an index on name. The front-end wildcard will still require a scan, but this will be on the index which is narrower, hence fits more readily in the cache etc.
Indeed if rather than a SELECT * this query targets only a few of the fields of the myTable table [and if this table's record are "wide"], you can create a index made of all these fields.
Would the number of records grow past, say, 50,000 (and, to a lesser degree, would the application "hit" the database with similar queries at a rate above say 40 per minute), you may consider introducing more efficient ways of dealing with keywords: Full Text Catalog or a "hand made" table with the individual keywords.

2)表现。因为此表中的记录很少,所以前面的通配符方法非常可行,当然也是最简单的方法。没有理由再搜索了!如果记录非常广泛(许多字段中的一些字段长度超过30个字符),则可以在名称上创建索引。前端通配符仍然需要扫描,但这将在索引上更窄,因此更容易适应缓存等。事实上,如果不是SELECT *,这个查询只针对myTable的几个字段table [如果此表的记录是“wide”],您可以创建由所有这些字段组成的索引。记录数量是否超过50,000(并且,在较小程度上,应用程序会以高于每分钟40次的速率使用类似查询“点击”数据库),您可以考虑引入更有效的处理方式关键字:全文目录或带有各个关键字的“手工制作”表。

3) Advantages of another approach. The advantage of a solution whereby the application maintains a table with a list of the individual keywords, readily parsed, from the full name, doesn't only provide better scaling (when the table and/or usage grows), but also introduces improvements in the quality of the search.
For example, it may allow improving the effective recall by introducing common common nicknames of first names (Bill or Will or Billy for William, Dick for Richard, Jack or Johnny for John etc.). Another possibility open by a more sophisticated approach is the introduction of a Soundex or modified Soundex encoding of the name tokens, allowing the users to locates names even when they may mispell or ignore the precise spelling (eg. Wilmson vs. Wilmsen vs. Willmsonn etc.)

3)另一种方法的优点。应用程序维护一个表,其中包含从全名中轻松解析的各个关键字列表的解决方案的优点,不仅提供了更好的扩展(当表和/或使用增长时),而且还引入了改进搜索的质量。例如,它可以通过引入常用的名字昵称来改善有效回忆(Bill或Will或Billy代表William,Dick代表Richard,Jack或Johnny代表John等)。通过更复杂的方法开辟的另一种可能性是引入Soundex或修改名称标记的Soundex编码,允许用户定位名称,即使它们可能错误拼写或忽略精确拼写(例如Wilmson vs. Wilmsen vs. Willmsonn等。)

#2


10  

You can use:

您可以使用:

SELECT * 
 FROM mytable 
WHERE name LIKE '%matt%'

#3


1  

You have the following options:

您有以下选择:

  1. Full Text Search (FTS)
  2. 全文检索(FTS)
  3. Regular Expressions
  4. 常用表达
  5. LIKE Using wildcards
  6. 喜欢使用通配符

...in that order of preference.

......按照优先顺序排列。

#4


0  

If you are trying to search for the names through any development Language, you can use the Regular expression package in Java. Some thing like java.util.regex.*;

如果您尝试通过任何开发语言搜索名称,则可以使用Java中的正则表达式包。像java.util.regex。*;