区间数据的类似于直方图的摘要

时间:2022-02-15 14:58:03

How do I get a histogram-like summary of interval data in R?

如何获得R中区间数据的类似直方图的摘要?

My MWE data has four intervals.

我的MWE数据有四个间隔。

interval  range
Int1      2-7
Int2      10-14
Int3      12-18
Int4      25-28

I want a histogram-like function which counts how the intervals Int1-Int4 span a range split across fixed-size bins. The function output should look like this:

我想要一个类似于直方图的函数,它计算间隔Int1-Int4如何跨越固定大小的区间的范围。函数输出应如下所示:

bin     count  which
[0-4]   1      Int1
[5-9]   1      Int1
[10-14] 2      Int2 and Int3
[15-19] 1      Int3
[20-24] 0      None
[25-29] 1      Int4

Here the range is [minfloor(Int1, Int2, Int3, Int40), maxceil(Int1, Int2, Int3, Int4)) = [0,30) and there are six bins of size = 5.

这里的范围是[minfloor(Int1,Int2,Int3,Int40),maxceil(Int1,Int2,Int3,Int4))= [0,30]并且有六个大小= 5的区间。

I would greatly appreciate any pointers to R packages or functions that implement the functionality I want.

我非常感谢任何指向R包或实现我想要的功能的函数的指针。

Update:

更新:

So far, I have a solution from the IRanges package which uses a fast data structure called NCList, which is faster than Interval Search Trees according to users.

到目前为止,我有一个IRanges软件包的解决方案,该软件包使用称为NCList的快速数据结构,根据用户的说法,它比Interval Search Trees更快。

> library(IRanges)
> subject <- IRanges(c(2,10,12,25), c(7,14,18,28))
> query <- IRanges(c(0,5,10,15,20,25), c(4,9,14,19,24,29))
> countOverlaps(query, subject)
[1] 1 1 2 1 0 1

But I am still unable to get which are the ranges that overlap. Will update if I get through.

但我仍然无法得到重叠的范围。如果我通过,将更新。

1 个解决方案

#1


1  

Using IRanges, you should use findOverlaps or mergeByOverlaps instead of countOverlaps. It, by default, doesn't return no matches though.

使用IRanges,您应该使用findOverlaps或mergeByOverlaps而不是countOverlaps。默认情况下,它不会返回任何匹配项。

I'll leave that to you. Instead, will show an alternate method using foverlaps() from data.table package:

我会留给你的。相反,将显示使用data.table包中的foverlaps()的替代方法:

require(data.table)
subject <- data.table(interval = paste("int", 1:4, sep=""), 
                      start = c(2,10,12,25), 
                      end = c(7,14,18,28))
query <- data.table(start = c(0,5,10,15,20,25), 
                    end = c(4,9,14,19,24,29))

setkey(subject, start, end)
ans = foverlaps(query, subject, type="any")
ans[, .(count = sum(!is.na(start)), 
        which = paste(interval, collapse=", ")), 
     by = .(i.start, i.end)]

#    i.start i.end count      which
# 1:       0     4     1       int1
# 2:       5     9     1       int1
# 3:      10    14     2 int2, int3
# 4:      15    19     1       int3
# 5:      20    24     0         NA
# 6:      25    29     1       int4

#1


1  

Using IRanges, you should use findOverlaps or mergeByOverlaps instead of countOverlaps. It, by default, doesn't return no matches though.

使用IRanges,您应该使用findOverlaps或mergeByOverlaps而不是countOverlaps。默认情况下,它不会返回任何匹配项。

I'll leave that to you. Instead, will show an alternate method using foverlaps() from data.table package:

我会留给你的。相反,将显示使用data.table包中的foverlaps()的替代方法:

require(data.table)
subject <- data.table(interval = paste("int", 1:4, sep=""), 
                      start = c(2,10,12,25), 
                      end = c(7,14,18,28))
query <- data.table(start = c(0,5,10,15,20,25), 
                    end = c(4,9,14,19,24,29))

setkey(subject, start, end)
ans = foverlaps(query, subject, type="any")
ans[, .(count = sum(!is.na(start)), 
        which = paste(interval, collapse=", ")), 
     by = .(i.start, i.end)]

#    i.start i.end count      which
# 1:       0     4     1       int1
# 2:       5     9     1       int1
# 3:      10    14     2 int2, int3
# 4:      15    19     1       int3
# 5:      20    24     0         NA
# 6:      25    29     1       int4