如何从包含SQL中另一个表的关键字的1个表中选择行?

时间:2022-02-12 09:22:10

I have two tables - one with sentences, and another with keywords. I would like to select rows from the sentences table that contain any of the keywords.

我有两个表 - 一个有句子,另一个有关键字。我想从句子表中选择包含任何关键字的行。

For Example:

Sentences

  • I like my dog

    我喜欢我的狗

  • My favorite food is pasta

    我最喜欢的食物是面食

  • Programming is fun

    编程很有趣

Key Words

  • favorite food

  • dog

My goal is for the the first 2 rows to return.

我的目标是返回前两行。

So far I have:

到目前为止我有:

select a.*

from sentences a

join keywords b

on a.sentences like '%' || b.keywords || '%'

However I am getting the error "the execution of this query involves performing one or more Cartesian product joins that can not be optimized"

但是我收到错误“此查询的执行涉及执行一个或多个无法优化的笛卡尔积连接”

Any ideas? Thanks in advance. Also not sure if it matters much but I am doing this on SAS 9.4

有任何想法吗?提前致谢。也不确定它是否重要,但我在SAS 9.4上这样做

1 个解决方案

#1


1  

There are several issues to protect against in this kind of code but the primary concerns are: leading and trailing spaces in the search keywords, character case matching (most SAS character comparisons are case sensitive), duplicate matches (multiple keywords matching one sentence).

在这种代码中有几个问题需要防范,但主要关注点是:搜索关键字中的前导和尾随空格,字符大小写匹配(大多数SAS字符比较区分大小写),重复匹配(多个关键字匹配一个句子)。

The code pattern below should deal with these issues.

下面的代码模式应该处理这些问题。

select distinct a.*
from sentences a cross join keywords b
where findw(a.sentences,b.keywords,' ','ir');

The "distinct" argument will remove duplicate matches while the findw function specifies to use spaces ' ' as the only delimiter of concern and the 'ir" specifies to be case insensitive ( modifier 'i') and remove leading and trailing delimiters/spaces (modifier 'r' combined with designation of ' ' as the delimiter).

“distinct”参数将删除重复匹配,而findw函数指定使用空格''作为唯一关注的分隔符,'ir'指定不区分大小写(修饰符'i')并删除前导和尾随分隔符/空格(修饰符'r'与''作为分隔符的名称相结合)。

Depending upon the data sizes involved you might see better performance by using a data step and a hash table. This would allow you to stop testing a sentence on the first keyword match. The sql code tests every sentence against every keyword.

根据所涉及的数据大小,您可以通过使用数据步骤和哈希表来获得更好的性能。这将允许您停止测试第一个关键字匹配的句子。 sql代码针对每个关键字测试每个句子。

#1


1  

There are several issues to protect against in this kind of code but the primary concerns are: leading and trailing spaces in the search keywords, character case matching (most SAS character comparisons are case sensitive), duplicate matches (multiple keywords matching one sentence).

在这种代码中有几个问题需要防范,但主要关注点是:搜索关键字中的前导和尾随空格,字符大小写匹配(大多数SAS字符比较区分大小写),重复匹配(多个关键字匹配一个句子)。

The code pattern below should deal with these issues.

下面的代码模式应该处理这些问题。

select distinct a.*
from sentences a cross join keywords b
where findw(a.sentences,b.keywords,' ','ir');

The "distinct" argument will remove duplicate matches while the findw function specifies to use spaces ' ' as the only delimiter of concern and the 'ir" specifies to be case insensitive ( modifier 'i') and remove leading and trailing delimiters/spaces (modifier 'r' combined with designation of ' ' as the delimiter).

“distinct”参数将删除重复匹配,而findw函数指定使用空格''作为唯一关注的分隔符,'ir'指定不区分大小写(修饰符'i')并删除前导和尾随分隔符/空格(修饰符'r'与''作为分隔符的名称相结合)。

Depending upon the data sizes involved you might see better performance by using a data step and a hash table. This would allow you to stop testing a sentence on the first keyword match. The sql code tests every sentence against every keyword.

根据所涉及的数据大小,您可以通过使用数据步骤和哈希表来获得更好的性能。这将允许您停止测试第一个关键字匹配的句子。 sql代码针对每个关键字测试每个句子。