最接近的全文搜索匹配

时间:2022-09-13 07:56:46

I am trying to implement an internal search for my website that can point users in the right direction in case the mistype a word, something like the did you mean : in google search.

我正在尝试对我的网站实施内部搜索,可以指出用户正确的方向,以防错误输入一个单词,就像你的意思:谷歌搜索。

Does anybody have an idea how such a search can be done? How can we establish the relevance of the word or the phrase we assume the user intended to search for?

有没有人知道如何进行这样的搜索?我们如何确定我们假设用户打算搜索的单词或短语的相关性?

  • i use asp.net and sql server 2005 with FTS (fullTextSearch)
  • 我使用asp.net和sql server 2005与FTS(fullTextSearch)

Thank you

谢谢

5 个解决方案

#1


4  

You could use an algorithm for determining string similarity and then suggest other string from your search index up to a certain difference.

您可以使用算法来确定字符串相似性,然后从搜索索引中建议其他字符串,直到达到一定的差异。

One of these algorithms is the Levenshtein distance.

其中一种算法是Levenshtein距离。

However, don't forget searching for existing solutions. I think e.g. Lucene has the capability to search for similar strings.

但是,不要忘记搜索现有的解决方案。我想,例如Lucene有能力搜索类似的字符串。

Btw, here's a related post on this topic: How does the Google “Did you mean?” Algorithm work?

顺便说一句,这是关于这个主题的相关文章:谷歌“你的意思是什么?”算法是如何工作的?

#2


2  

This is done querying through regular expression the closest keywords that match the phrase.

这是通过正则表达式查询与该短语匹配的最接近的关键字。

Here is a great article that might help you.

这篇文章很有帮助。

#3


0  

With T-SQL You can use the SOUNDEX function to compare words phonetically.

使用T-SQL您可以使用SOUNDEX功能以语音方式比较单词。

If you take the users input and then compare it with other words in your database by soundex code, you should be able to come up with a list of 'do you mean'? words.

如果您接受用户输入,然后通过soundex代码将其与数据库中的其他单词进行比较,您应该能够找到“你是说”的列表吗?话。

E.g.

例如。

select SOUNDEX('andrew')
select SOUNDEX('androo')

will both produce the same output (A536).

将产生相同的输出(A536)。

There are better algorithms these days, but soundex is built into sql server.

现在有更好的算法,但soundex内置在sql server中。

#4


0  

The simplest approach I can think of is to write a function that returns the degree of mismatch between two words, and you loop through all the words and find the best ones.

我能想到的最简单的方法是编写一个返回两个单词之间不匹配程度的函数,然后循环遍历所有单词并找到最佳单词。

I've done this with a branch-and-bound method. Let me dig up the code:

我用分支定界方法做到了这一点。让我挖掘代码:

bool matchWithinBound(char* a, char* b, int bound){
  // skip over matching characters
  while(*a && *b && *a == *b){a++; b++;}
  if (*a==0 && *b==0) return true;
  // if bound too low, quit
  if (bound <= 0) return false;
  // try assuming a has an extra character
  if (*a && matchWithinBound(a+1, b, bound-1)) return true;
  // try assuming a had a letter deleted
  if (*b && matchWithinBound(a, b+1, bound-1)) return true;
  // try assuming a had a letter replaced
  if (*a && *b && matchWithinBound(a+1, b+1, bound-1)) return true;
  // try assuming a had two adjacent letters swapped
  if (a[0] && a[1]){
    char temp;
    int success;
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    success = matchWithinBounds(a, b, bound-1);
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    if (success) return true;
  }
  // can try other modifications
  return false;
}

int DistanceBetweenWords(char* a, char* b){
  int bound = 0;
  for (bound = 0; bound < 10; bound++){
    if (matchWithinBounds(a, b, bound)) return bound;
  }
  return 1000;
}

#5


0  

why don't you use google power?, you can consume their suggest service

你为什么不使用google power ?,你可以使用他们的推荐服务

here is an example on c#

这是c#的一个例子

#1


4  

You could use an algorithm for determining string similarity and then suggest other string from your search index up to a certain difference.

您可以使用算法来确定字符串相似性,然后从搜索索引中建议其他字符串,直到达到一定的差异。

One of these algorithms is the Levenshtein distance.

其中一种算法是Levenshtein距离。

However, don't forget searching for existing solutions. I think e.g. Lucene has the capability to search for similar strings.

但是,不要忘记搜索现有的解决方案。我想,例如Lucene有能力搜索类似的字符串。

Btw, here's a related post on this topic: How does the Google “Did you mean?” Algorithm work?

顺便说一句,这是关于这个主题的相关文章:谷歌“你的意思是什么?”算法是如何工作的?

#2


2  

This is done querying through regular expression the closest keywords that match the phrase.

这是通过正则表达式查询与该短语匹配的最接近的关键字。

Here is a great article that might help you.

这篇文章很有帮助。

#3


0  

With T-SQL You can use the SOUNDEX function to compare words phonetically.

使用T-SQL您可以使用SOUNDEX功能以语音方式比较单词。

If you take the users input and then compare it with other words in your database by soundex code, you should be able to come up with a list of 'do you mean'? words.

如果您接受用户输入,然后通过soundex代码将其与数据库中的其他单词进行比较,您应该能够找到“你是说”的列表吗?话。

E.g.

例如。

select SOUNDEX('andrew')
select SOUNDEX('androo')

will both produce the same output (A536).

将产生相同的输出(A536)。

There are better algorithms these days, but soundex is built into sql server.

现在有更好的算法,但soundex内置在sql server中。

#4


0  

The simplest approach I can think of is to write a function that returns the degree of mismatch between two words, and you loop through all the words and find the best ones.

我能想到的最简单的方法是编写一个返回两个单词之间不匹配程度的函数,然后循环遍历所有单词并找到最佳单词。

I've done this with a branch-and-bound method. Let me dig up the code:

我用分支定界方法做到了这一点。让我挖掘代码:

bool matchWithinBound(char* a, char* b, int bound){
  // skip over matching characters
  while(*a && *b && *a == *b){a++; b++;}
  if (*a==0 && *b==0) return true;
  // if bound too low, quit
  if (bound <= 0) return false;
  // try assuming a has an extra character
  if (*a && matchWithinBound(a+1, b, bound-1)) return true;
  // try assuming a had a letter deleted
  if (*b && matchWithinBound(a, b+1, bound-1)) return true;
  // try assuming a had a letter replaced
  if (*a && *b && matchWithinBound(a+1, b+1, bound-1)) return true;
  // try assuming a had two adjacent letters swapped
  if (a[0] && a[1]){
    char temp;
    int success;
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    success = matchWithinBounds(a, b, bound-1);
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    if (success) return true;
  }
  // can try other modifications
  return false;
}

int DistanceBetweenWords(char* a, char* b){
  int bound = 0;
  for (bound = 0; bound < 10; bound++){
    if (matchWithinBounds(a, b, bound)) return bound;
  }
  return 1000;
}

#5


0  

why don't you use google power?, you can consume their suggest service

你为什么不使用google power ?,你可以使用他们的推荐服务

here is an example on c#

这是c#的一个例子