将数字向量分割成不相等的部分，然后对每个部分应用自定义函数

I have a long sequence of 1s and 0s which represent bird incubation patterns, 1 being bird ON the nest, 0 being OFF.

我有一长串的1和0代表鸟类孵化模式，1代表鸟类在巢上，0代表关。

    > Fake.data<- c(1,1,1,1,1,0,0,1,1,1,1,0,0,0,1,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,0,0,0,0,1,1,0,1,0)

As an end point I would essentially like a single value for the ratio between each ON period and the consecutive OFF period. So ideally this should be for Fake.data a vector like this

作为结束点，我想要一个单独的值表示每个周期和连续关闭周期之间的比率。理想情况下，这应该是假的。像这样的数据

    [1] 0.4  0.75  0.25  0.5  0.8  0.5  1 (I just typed this out!)

So far I have split the vector into sections using split()

到目前为止，我已经使用split()将矢量分割成多个部分

    > Diff<-diff(Fake.data)
    > SPLIT<-split(Fake.data, cumsum(c(1, Diff > 0 )))
    > SPLIT

Which returns...

它返回……

    $`1`
    [1] 1 1 1 1 1 0 0
    $`2`
    [1] 1 1 1 1 0 0 0
    $`3`
    [1] 1 1 1 1 0
    $`4`
    [1] 1 1 1 1 0 0
    $`5`
    [1] 1 1 1 1 1 0 0 0 0
    $`6`
    [1] 1 1 0
    $`7`
    [1] 1 0

So I can get the ratio for a single split group using

我可以得到一个分裂组的比值

    > SPLIT$'1'<- ((length(SPLIT$'1'))-(sum(SPLIT$'1')))/sum(SPLIT$'1')
    > SPLIT$'1'
    [1] 0.4

However in my data I have some several thousand of these to do and would like to apply some sort of tapply() or for() loop to calculate this automatically for all and put it into a single vector. I have tried each of these methods with little success as the split() output structure does not seem to fit with these functions?

但是，在我的数据中，我有几千个这样的函数，我想应用某种tapply()或for()循环来自动地为所有函数计算这个函数，并将它放入一个单独的向量中。我尝试过这些方法，但都没有成功，因为split()输出结构似乎不适合这些函数?

I create a new vector to receive the for() loop output

我创建一个新的向量来接收for()循环输出

    ratio<-rep(as.character(NA),(length(SPLIT)))

Then attempting the for() loop using the code above which work for a single run.

然后使用上面的代码尝试for()循环，该代码可用于一次运行。

    for(i in SPLIT$'1':'7')
    {ratio[i]<-((length(SPLIT$'[i]'))-(sum(SPLIT$'[i]')))/sum(SPLIT$'[i]')}

What I get is...

我得到的是……

[1] "NaN" "NaN" "NaN" "NaN" "NaN" "NaN" NA

[1]“NaN”“NaN”“NaN”“NaN”“NaN”“NaN”“NaN”NA

Tried many other variations along this theme but now just really stuck!

在这个主题上尝试了许多其他的变奏，但是现在真的被卡住了!

2 个解决方案

#1

I think you were very close with your stategy. The sapply function is very happy to work with lists. I would just change the last step to

我认为你和你的身份很接近。sapply函数非常乐意使用列表。我把最后一步改成

sapply(SPLIT, function(x) sum(x==0)/sum(x==1))

which returns

它返回

   1    2    3    4    5    6    7 
0.40 0.75 0.25 0.50 0.80 0.50 1.00

with your sample data. No additional packages needed.

你的样本数据。不需要额外的软件包。

#2

Here are two possibiities:

这里有两个possibiities:

1) Compute the lengths using rle and then in the if statement if the data starts with 0 don't include the first length so we are assured that we are starting out with a 1. Finally compute the ratios using rollapply from the zoo package:

1)用rle计算长度，然后在if语句中如果数据从0开始，不包括第一个长度，所以我们确信我们从1开始。最后使用zoo软件包中的rollapply计算比率:

library(zoo)

lengths <- rle(Fake.data)$lengths
if (Fake.data[1] == 0) lengths <- lengths[-1]

rollapply(lengths, 2, by = 2, function(x) x[2]/x[1])

giving:

给:

[1] 0.40 0.75 0.25 0.50 0.80 0.50 1.00

The if line can be removed if we know that the data always starts with a 1.

如果我们知道数据总是以1开头，则可以删除if行。

2) If we can assume that the series always starts with a 1 and ends in a 0 then this one liner would work:

2)如果我们假设级数总是以1开始，以0结束，那么这一行就可以:

with( rle(Fake.data), lengths[values == 0] / lengths[values == 1] )

giving the same answer as above.

给出和上面一样的答案。

#1