基于来自矢量a的值，矢量b中的除了第一相同/重复值以外的所有值的矢量化变化

I am trying find a vectorized solution of updating vector b values based on values of vector a. The problem I have is this:

我正在尝试找到一个基于向量a的值更新向量b值的向量化解决方案。我遇到的问题是：

> # Vector a is the "driver" meaning if there is 1 or -1 in vector a
> # -1 or 1 needs to follow in vector b. The challenge I have is when 
> # I have 1 or -1 in a and in b I have two or more -1 or 1
> # then all but first same values in b should be set to 0 if values 
> # in a does not change
> a <- c(0, 1, 0, 0, 0, 0, 0,-1, 0, 0, 1, 1,-1,-1, 0, 0, 1, 0, 0,-1, 0, 1, 0, 0, 0, 0, 0)
> b <- c(0, 0,-1, 0,-1, 0, 0, 0, 0, 1, 1,-1,-1, 1, 1, 0, 0,-1, 0, 0, 1, 0,-1,-1, 0,-1, 0)
> a
 [1]  0  1  0  0  0  0  0 -1  0  0  1  1 -1 -1  0  0  1  0  0 -1  0  1  0  0  0  0  0
> b
 [1]  0  0 -1  0 -1  0  0  0  0  1  1 -1 -1  1  1  0  0 -1  0  0  1  0 -1 -1  0 -1  0
> 
> # I need a vectorized function(a, b), if possible, that changes b 
> # based on a like below (removing some repeated values in b)
> # like below
> b[5] <- 0
> b[11] <- 0
> b[24] <- 0
> b[26] <- 0
> a
 [1]  0  1  0  0  0  0  0 -1  0  0  1  1 -1 -1  0  0  1  0  0 -1  0  1  0  0  0  0  0
> b
 [1]  0  0 -1  0  0  0  0  0  0  1  0 -1 -1  1  1  0  0 -1  0  0  1  0 -1  0  0  0  0

Any help/hint in how to do this in vectorized way highly appreciated.

任何帮助/提示如何以矢量化的方式做到这一点高度赞赏。

I tried "standard" approaches using rle, cumsum, diff, ...

我尝试使用rle，cumsum，diff，......的“标准”方法

# I tried to play around with
test <- data.frame(
        a=a,
        b=b,
        a.plus.b=a + b,
        diff.a.plus.b=c(0, diff(a + b)),
        cumsum.a.plus.b=cumsum(a + b),
        diff.cumsum.a.plus.b=c(0, diff(cumsum(a + b)))
)
test 

rle(b)
rle(b)$values
rle(b)$lengths

Edit: Based on David request to be more clear about what I am trying to do I will explain in length the problem.

编辑：基于大卫请求更清楚我想要做什么我会详细解释问题。

I am building simplified trading backtesting functionality (since quantstrat is to complex and to slow for my needs).

我正在构建简化的交易回溯测试功能（因为quantstrat是复杂的，并且减慢了我的需求）。

The problem above (at the top of the message) arises when I get an entry signal vector a above with values 1 (go long) or -1 (go short). After entry signal three things can happen (kept in vector b):
- a time stop is hit (exit at the end of day b==-1 if long and b==1 if short),
- a profit target is reached (again b==-1, b==1) or
- a stop loss triggered (again b==-1, b==1).

上面的问题（在消息的顶部）出现时，我得到一个上面的入口信号向量a值为1（go long）或-1（go short）。在输入信号之后可以发生三件事情（保持在向量b中）： - 一个时间停止被击中（在一天结束时退出b == - 1如果长，b == 1，如果短）， - 达到利润目标（再次b == - 1，b == 1）或 - 触发止损（再次b == - 1，b == 1）。

So vector b represents possible events/exits after each entry (there are no overlapping trades - one closes before another is entered). Sometimes the trades are going directly into my favour and we immediately hit profit target. Great. Sometimes we hit stop before we hit profit target. Sometimes neither stop is hit neither we reach profit target by end of day, so, we are left with end of day.

因此，向量b表示每个条目之后可能的事件/退出（没有重叠交易 - 一个在输入另一个之前关闭）。有时交易直接对我有利，我们立即达到了利润目标。大。有时我们在达到利润目标之前就停止了。有时候我们都没有达到止损，也没有达到每天结束时的利润目标，因此，我们留下了一天的结束。

I need to remove all but the first exit events after entry (a==1 or a==-1). Since not all can/will happen, just the first (from time perspective) should stay and I should remove the subsequent ones.

我需要在输入后删除除第一个退出事件之外的所有事件（a == 1或a == - 1）。由于不是所有的都能/将会发生，所以只有第一个（从时间的角度来看）应该留下来，我应该删除后续的。

Let me give an example. We enter a long trade at 9:31 (on close of a first minute regular trading hours bar). So a becomes:

让我举个例子。我们在9:31进入一个长期交易（在第一分钟正常交易时间结束时）。所以变成了：

a <- c(1, 0, 0, 0, 0, ..., 0)

We always exit at the close of last minute bar (time stop) so we add last possible exit to b:

我们总是在最后一分钟结束时退出（时间停止），所以我们将最后一个可能的退出添加到b：

b <- c(0, 0, 0, 0, 0, ...,-1)

We also know that (in the backtest) that our profit target would already be reached on the the close of the bar at 9:35 so we add this fact to b (b[5] <- -1):

我们也知道（在回测中）我们的利润目标已经在9:35的收盘时达到，所以我们将这个事实添加到b（b [5] < - -1）：

b <- c(0, 0, 0, 0,-1, ...,-1)

And, we also know (in the backtest) that a stop would trigger at 9:33 so we add this to b (b[3] <- -1) which now becomes:

并且，我们也知道（在回测中）停止会在9:33触发，所以我们将其添加到b（b [3] < - -1），现在变为：

b <- c(0, 0,-1, 0,-1, ...,-1)

So, since my profit target will never be reached (stop is hit before) and we will not be in the trade on the market close I should set b[5] <- 0 and b[length(b)] <- 0 . So, removing all but first exit triggers in b after entry (a==1). The b should become:

因此，由于我的利润目标永远不会达到（之前止损）并且我们不会在市场交易中关闭我应该设置b [5] < - 0和b [length（b）] < - 0。因此，在输入后删除b中除了第一个退出触发器之外的所有触发器（a == 1）。 b应该成为：

b <- c(0, 0,-1, 0, 0, ..., 0)

I need to process this for say thousand days in the past...

我需要过去处理这一千天......

I hope this clarifies what I am trying to do.

我希望这能澄清我想要做的事情。

1 个解决方案

#1

I'm not sure if I really understand what you're trying to do, but if do understand I think I have a vectorized solution for you.

我不确定我是否真的明白你想要做什么，但如果能理解我认为我有一个矢量化的解决方案。

> f <- function(a,b){
+   b[unique(c(which(a[-length(a)] == 0 & b[-1] != 0) + 1,which(b[-length(b)] == b[-1] & b[-1] != 0)))] <- 0
+   return(b)
+ }
> f(a,b)
 [1]  0  0 -1  0  0  0  0  0  0  0  0  0 -1  0  1  0  0 -1  0  0  1  0  0  0  0  0  0

Here was my rational. I think you want to set values of b to zero based on two different scenarios:

这是我的理性。我想你想根据两种不同的场景将b的值设置为零：

1) When non-zero values of b repeat. If so this should find those indices:

1）当b的非零值重复时。如果是这样，应该找到这些指数：

which(b[-length(b)] == b[-1] & b[-1] != 0)

2) When non-zero values of b occur when the previous index of a was zero. If so this should do the trick:

2）当a的前一个索引为零时，出现非零值的b。如果是这样，这应该做的伎俩：

which(a[-length(a)] == 0 & b[-1] != 0) + 1

Hopefully I didn't misunderstand your goals here.

希望我在这里没有误解你的目标。

EDIT:

编辑：

Second try here. I'm still pretty sure that I don't understand what you're trying to do since my solution still flags b[10] (which you say it shouldn't), but from what you're writing the best I can understand is that you want to make the following changes:

再试一次。我仍然非常确定我不明白你要做什么，因为我的解决方案仍标记b [10]（你说它不应该），但是从你写的最好我能理解是你想要做出以下改变：

Non-zero values of "b" that follow zero values of "a" must be set to zero.

跟随零值“a”的非零值“b”必须设置为零。

Since this rule incorrectly flags b[10] can you please tell me why it is incorrect? I think this problem will need to be phrased that way in order for me to give you a solution since the finance talk just sounds like jibberish to me.

由于这条规则错误地标记b [10]，你能告诉我为什么它不正确吗？我认为这个问题需要用这种方式来表达，以便我给你一个解决方案，因为财务谈话对我来说听起来像是胡言乱语。

Anyway, here is the vectorized solution for the rule I listed.:

无论如何，这是我列出的规则的矢量化解决方案：

> f <- function(a,b) {
+   b[which(b != 0)[which(!which(b != 0) %in% (which(a[-length(a)] != 0) + 1))]] <- 0
+   return(b)
+ }
> f.indices <- function(a,b) which(b != 0)[which(!which(b != 0) %in% (which(a[-length(a)] != 0) + 1))]
> f(a,b)
 [1]  0  0 -1  0  0  0  0  0  0  0  0 -1 -1  1  1  0  0 -1  0  0  1  0 -1  0  0  0  0
> f.indices(a,b)
[1]  5 10 11 24 26

EDIT: Third try is the charm...

编辑：第三次尝试是魅力......

Now operating under the assumption that goal is the set all non-zero values of b to be zero except for the first value that follows a non-zero value of a. I'm not sure if/how that can be fully vectorized, but here should a quick solution:

现在在目标是设定的假设下操作，除了a的非零值之后的第一个值之外，b的所有非零值都是零。我不确定是否/如何完全矢量化，但这应该是一个快速的解决方案：

> a <- c(0, 1, 0, 0, 0, 0, 0,-1, 0, 0, 1, 1,-1,-1, 0, 0, 1, 0, 0,-1, 0, 1, 0, 0, 0, 0, 0)
> b <- c(0, 0,-1, 0,-1, 0, 0, 0, 0, 1, 1,-1,-1, 1, 1, 0, 0,-1, 0, 0, 1, 0,-1,-1, 0,-1, 0)
> 
> f <- function(a,b){
+   #non-zero b indices
+   nz.b <- which(b != 0)
+   #non-zero a indices
+   nz.a <- which(a != 0)  
+   #non-zero b indices that do not follow non-zero a indices
+   nz.b.rm <- nz.b
+   for(i in nz.a){
+     nz.b.rm <- nz.b.rm[!nz.b.rm %in% nz.b[nz.b > i][1]] 
+   }
+   #print non-zero b indices that do no folow non-zero a indices
+   print(paste0("Indices Removed: ",paste(nz.b.rm,collapse=",")))
+   #remove non-zero b indices that do not follow non-zero a indices
+   return(b[-nz.b.rm])
+ }
> 
> b.new <- f(a,b)
[1] "Indices Removed: 5,11,24,26"
> b.new
 [1]  0  0 -1  0  0  0  0  0  1 -1 -1  1  1  0  0 -1  0  0  1  0 -1  0  0

#1