SSIS 2005/2008 does fuzzy lookups and groupings. Is there a feature that does the same in T-SQL?
SSIS 2005/2008做模糊查找和分组。在T-SQL中是否有相同的特性?
4 个解决方案
#1
7
Fuzzy lookup uses a q-gram approach, by breaking strings up into tiny sub-strings and indexing them. You can then then search input by breaking it up into equally sized strings. You can inspect the format of their index and write a CLR function to use the same style of index but you might be talking about a fair chunk of work.
模糊查找使用q-gram方法,将字符串分解成小的子字符串并对它们进行索引。然后你可以通过将输入分成大小相同的字符串来搜索输入。您可以检查它们的索引的格式,并编写一个CLR函数来使用相同的索引样式,但是您可能正在讨论大量的工作。
It is actually quite interesting how they did it, very simple yet provides very robust matching and is very configurable.
他们的做法非常有趣,非常简单,但是提供了非常健壮的匹配,并且是非常可配置的。
From that I recall of the index when I last looked at it, each q-gram or substring is stored in a row in an table (the index). That row contains an nvarchar column (among other values) that is used as binary data and contains references to the rows that match.
我记得上次查看索引时,每个q-gram或子字符串都存储在表(索引)中的一行中。该行包含一个nvarchar列(以及其他值),该列用作二进制数据,并包含对匹配行的引用。
There is also an open feedback suggestion on Microsoft Connect for this feature.
对于这个特性,微软Connect也有一个开放的反馈建议。
#2
7
SQL Server has a SOUNDEX() function:
SQL Server具有SOUNDEX()函数:
SELECT *
FROM Customers
WHERE SOUNDEX(Lastname) = SOUNDEX('Stonehouse')
AND SOUNDEX(Firstname) = SOUNDEX('Scott')
#4
1
On March 5 2009 I will have an article posted on www.sqlservercentral.com with a sample of Jaro-Winkler TSQL
2009年3月5日,我将在www.sqlservercentral.com上发布一篇文章,其中包含Jaro-Winkler TSQL的示例
#1
7
Fuzzy lookup uses a q-gram approach, by breaking strings up into tiny sub-strings and indexing them. You can then then search input by breaking it up into equally sized strings. You can inspect the format of their index and write a CLR function to use the same style of index but you might be talking about a fair chunk of work.
模糊查找使用q-gram方法,将字符串分解成小的子字符串并对它们进行索引。然后你可以通过将输入分成大小相同的字符串来搜索输入。您可以检查它们的索引的格式,并编写一个CLR函数来使用相同的索引样式,但是您可能正在讨论大量的工作。
It is actually quite interesting how they did it, very simple yet provides very robust matching and is very configurable.
他们的做法非常有趣,非常简单,但是提供了非常健壮的匹配,并且是非常可配置的。
From that I recall of the index when I last looked at it, each q-gram or substring is stored in a row in an table (the index). That row contains an nvarchar column (among other values) that is used as binary data and contains references to the rows that match.
我记得上次查看索引时,每个q-gram或子字符串都存储在表(索引)中的一行中。该行包含一个nvarchar列(以及其他值),该列用作二进制数据,并包含对匹配行的引用。
There is also an open feedback suggestion on Microsoft Connect for this feature.
对于这个特性,微软Connect也有一个开放的反馈建议。
#2
7
SQL Server has a SOUNDEX() function:
SQL Server具有SOUNDEX()函数:
SELECT *
FROM Customers
WHERE SOUNDEX(Lastname) = SOUNDEX('Stonehouse')
AND SOUNDEX(Firstname) = SOUNDEX('Scott')
#3
#4
1
On March 5 2009 I will have an article posted on www.sqlservercentral.com with a sample of Jaro-Winkler TSQL
2009年3月5日,我将在www.sqlservercentral.com上发布一篇文章,其中包含Jaro-Winkler TSQL的示例