R:比较一列数据。表一个向量

I have a column of a data.table:

我有一列数据。

DT = data.table(R=c(3,8,5,4,6,7))

Further on I have a vector of upper cluster limits for the cluster 1, 2, 3 and 4:

进一步，我有一个关于1、2、3和4的聚类上限的向量:

CP=c(2,4,6,8)

Now I want to compare each entry of R with the elements of CP considering the order of CP. The result

现在我想比较R的每一个元素和CP的元素，考虑CP的顺序，结果

DT[,NoC:=c(2,4,3,2,3,4)]

shall be a column NoC in DT, whose entries are just the number of that cluster, which the element of R belongs to. (I need the cluster number to choose a factor out of another data.table.)

应该是DT中的列NoC，它的条目就是这个簇的数量，这个簇是R的元素所属的。(我需要集群号从另一个数据表中选择一个因子。)

For example take the 1st entry of R: 3 is not smaller than 2 (out of CP), but smaller than 4 (out of CP). So, 3 belongs to cluster 2.

例如，R: 3的第一项不小于2 (CP中)，但小于4 (CP中)。3属于簇2。

Another exmaple, take the 6th entry of R: 7 is neither smaller than 2, 4 nor 6 (out of CP), but shmaller than 8 (out of CP). So, 7 belongs to cluster 4.

另一个exmaple，取R: 7的第6项:7既不小于2,4也不小于6(在CP中)，但小于8 (CP中)。7属于第4簇。

How can I do that without using if-clauses?

如果不使用if子句，我怎么做呢?

2 个解决方案

#1

From your description this would seem to be the code to deliver the correct answers, but Arun, a most skillful data.tablist, seems to have come up with a completely different way to fit your expectations, so I think there must be a different way of reading your requirements.

从您的描述来看，这似乎是传递正确答案的代码，但Arun是最熟练的数据。tablist似乎想出了一种完全不同的方式来满足你的期望，所以我认为一定有一种不同的方式来阅读你的需求。

> DT[ , NoC:= findInterval(R, c(0, 2,4,6,8) , rightmost.closed=TRUE)]
> DT
   R NoC
1: 3   2
2: 8   4
3: 5   3
4: 4   3
5: 6   4
6: 7   4

I'm also very puzzled that findInterval is assigning the 5th item to the 4th interval since 6 is not greater than the upper boundary of the third interval (6).

我也很困惑，findInterval将第5项赋给第4个区间，因为6不大于第三个区间(6)的上边界。

#2

You can accomplish this using rolling joins:

您可以使用滚动连接实现此功能:

data.table(CP, key="CP")[DT, roll=-Inf, which=TRUE]
# [1] 2 4 3 2 3 4

roll=-Inf performs a NOCB rolling join - Next Observation Carried Backward. That is, in the event of value falling in a gap, the next observation will be rolled backward. Ex: 7 falls between 6 and 8. The next value is 8 - will be rolled backward. We simply get the corresponding index of each match using which=TRUE.

roll=- inf执行一个NOCB滚动连接——接下来的观察是向后进行的。也就是说，如果有价值下降，下一个观测将被逆转。7在6和8之间。下一个值是8 -将向后滚动。我们只需使用which=TRUE来得到每个匹配的对应索引。

You can just add this as a column to DT using := as you've shown.

您可以将其作为一列添加到DT，使用:=，如您所示。

Note that this will return the indices after ordering CP. In your example, CP is already ordered, so it returns the result as intended. If CP is not already ordered, you'll have to add an additional column and extract that column instead of using which=TRUE. But I'll leave it to you to work it out.

请注意，这将在订购CP之后返回索引。在您的示例中，CP已经被订购，因此它将按预期返回结果。如果CP还没有排序，那么您将必须添加一个附加列并提取该列，而不是使用which=TRUE。但我把它留给你来解决。

#1

> DT[ , NoC:= findInterval(R, c(0, 2,4,6,8) , rightmost.closed=TRUE)]
> DT
   R NoC
1: 3   2
2: 8   4
3: 5   3
4: 4   3
5: 6   4
6: 7   4