将函数应用于dataframe或多个列表。

时间:2022-07-01 22:56:56

Edit as Per the comments: The OP would like to calculate:

根据评论编辑:OP想要计算:

(100 *  (1 - 10 ^ - (Do - Do[Do==0] )) ⎞ (1 - 10 ^ - (Do[Do==100] - Do[Do==0]) - Do

For each combination of Cl, In, Sa in the data.frame
-RS

对于每一个Cl的组合,在data.frame -RS中。


I am trying to apply a function, called dG, to a dataframe. Since the function's arguments length differ recycling produced unpredictable results.

我正在尝试将一个函数dG应用到dataframe中。由于函数的参数长度不同,循环产生不可预测的结果。

To rectify this issue I separated the dataframe into lists and tried to apply the dG function (below) to each list after identifing each list with a function called 'ids'.

为了纠正这个问题,我将dataframe分隔为列表,并尝试将dG函数(下面的)应用到每个列表,然后使用一个名为“ids”的函数来标识每个列表。

  • Please feel free to suggest a different solution. FYI, my specific requests start with bullet points
  • 请随意提出不同的解决方案。FYI,我的具体请求从要点开始。

Please let me start by providing synthetic data that shows the issues:

请让我开始提供综合数据,显示问题:

Do <- rep(c(0,2,4,6,8,10,15,20,30,40,45,50,55,60,65,70,80,85,90,92,94,96,98,100), each=16,times=16)
Cl <- rep(c("K", "Y","M","C"), each= 384, times=4)
In <- rep(c("A", "S"), each=3072)
Sa <- rep(c(1,2), each=1536)
Data <- rnorm(6144)
DataFrame <- cbind.data.frame(Do,Cl,In,Sa,Data); head(DataFrame)
rm(Do,Cl,In,Sa,Data)
attach(DataFrame)

DFSplit <- split(DataFrame[ , "Data"], list(Do, Cl, In, Sa))

The function 'ids' is a helper function that identifies the lists names

函数“ids”是一个帮助函数,用来标识列表的名称。

ids <- function(Do, Cl, In, Sa){
    grep( paste( "^" , Do, "\\.",
                Cl, "\\.",
                In,
                "\\.", Sa,sep=""),
         names(DFSplit), value = TRUE)}

mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE)

The above mapply produces 6144 lists. If you look at the mapply output you will notice that there is 384 unique list names but each is repeated 16 times 384*16=6144.

以上mapply生产6144张清单。如果您查看mapply输出,您会注意到有384个惟一的列表名称,但是每个都重复16次384*16=6144。

  • How can I change the 'ids' function so that mapply doesn't repeat the same name 16 times.
  • 如何更改“id”函数,使mapply不再重复相同的名称16次。

As an ugly and highly costly solution I used unique; I need a better fundamental solution.

作为一个丑陋而昂贵的解决方案,我使用了独特的;我需要一个更好的基本解决方案。

unique(mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE))

The dG function is the one that I want to operates on each of the 'DFSplit' lists. It has the same issue as the previous ids function. It uses the ids function as an input.

dG函数是我想在每个DFSplit列表中操作的函数。它与前面的ids函数有相同的问题。它使用ids函数作为输入。

dG <- function(Do,Cl, In, Sa){
    dg <- 100*
                (1-10^-( DFSplit[[ids(Do,  Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) /
                (1-10^-( DFSplit[[ids(100, Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) - Do
    dg}

I tried to use dG as follows and it is not what I want.

我尝试使用dG,这不是我想要的。

dG(Do,Cl, In, Sa)

It only evaluated the LAST part of the dG function (- Do) plus this warning

它只评估dG函数的最后一部分(- Do)加上这个警告。

In grep(paste("^", unique(Do), "\.", unique(Cl), "\.", unique(In), : argument 'pattern' has length > 1 and only the first element will be used

在grep(粘贴(“^”,独特的(做),“\”。,独特的(Cl)、“\”。,唯一的(In):参数“pattern”有长度> 1,只有第一个元素会被使用。

  • Can you suggest a modification to the dG function
  • 你能建议修改dG功能吗?

Then I tried mapply

然后我试着宾州

mapply(dG, Do, Cl, In, Sa, SIMPLIFY = FALSE)

mapply correctly evaluated the function with my data. mapply produces 6144 lists. You will notice that the mapply output is basically 384 unique lists, each repeated 16 times 384*16=6144.

mapply用我的数据正确地评估了函数。宾州生产6144列表。您将注意到,mapply输出基本上是384个惟一列表,每个列表重复16次384*16=6144。

  • How can I modify the dG function to get rid of the useless and time consuming repetition?
  • 如何修改dG函数来消除无用和耗时的重复?

My thought would be:

我的想法是:

  1. eliminate the repetition in my first function 'ids', which I do not know how to do .
  2. 在我的第一个函数“ids”中消除重复,我不知道该怎么做。
  3. change the arguments of the second function so the arguments' lengths would be 384. Maybe use the names of the lists as an input argument. which I do not know how.

    改变第二个函数的参数,所以参数的长度是384。可能使用列表的名称作为输入参数。我不知道该怎么做。

  4. Change the formula dG and not use (Do, Cl, In, Sa) arguments since each one has a length of 6144

    改变公式dG,而不是使用(Do, Cl, In, Sa)参数,因为每个参数的长度为6144。

1 个解决方案

#1


5  

UPDATE:

The comment you made to @Roland, was all you had to put in each of your previous related questions, this once included.

你对@Roland的评论,是你在之前的所有相关问题中所需要的,这一次包括了。

The entirety of your process can be handled in one line of code:

整个过程可以在一行代码中处理:

library(data.table)
myDT <- data.table(DataFrame)

myDT[ , "TVI" :=  100 * (1 - 10^-(Data - Data[Do==0])) / (1 - 10^-(Data[Do==100] - Data[Do==0])) 
      , by=list(Cl, In, Sa)]

# this is your Tonval Value Increase
myDT$TVI


original answer:

最初的回答:

It's stil awfully unclear what you are trying to accomplish. However, here are two concepts that should be able to save you a world of headaches.

你想要完成的是什么还不清楚。但是,这里有两个概念可以为您省去一个头痛的世界。

First, you do not need your ids function. You can get more mileage out of expand.grid:

myIDs <- expand.grid(unique(Do), unique(Cl), unique(In), unique(Sa))

# You can then use something like 
apply(myIDs, 1, paste, sep=".")
# to get the same results.  Or whatever other function suits

However, even that is not neccessary.

然而,即使那样也不是必要的。


Here is the equivalent of your dG function using data.table.

Notice there is no need for any of the splitting or ids or anything like that.
Everything is hanlded by the by function in data.table.

注意,不需要任何分裂或id之类的东西。所有的东西都是由data.table中的by函数来完成的。

library(data.table)
myDT <- data.table(DataFrame)

myDT

dG_DT <- 
    100 * 
    1 - 10^(   myDT[ ,     Data, by=list(Do, Cl, In, Sa)][, Data] 
             - myDT[Do==0, Data, by=list(Do, Cl, In, Sa)][, Data]
            ) / 

    1 - 10^(   myDT[Do==100, Data, by=list(Do, Cl, In, Sa)][, Data]
             - myDT[Do==0,   Data, by=list(Do, Cl, In, Sa)][, Data]
            ) - 
    myDT[, Do]

dG_DT

#1


5  

UPDATE:

The comment you made to @Roland, was all you had to put in each of your previous related questions, this once included.

你对@Roland的评论,是你在之前的所有相关问题中所需要的,这一次包括了。

The entirety of your process can be handled in one line of code:

整个过程可以在一行代码中处理:

library(data.table)
myDT <- data.table(DataFrame)

myDT[ , "TVI" :=  100 * (1 - 10^-(Data - Data[Do==0])) / (1 - 10^-(Data[Do==100] - Data[Do==0])) 
      , by=list(Cl, In, Sa)]

# this is your Tonval Value Increase
myDT$TVI


original answer:

最初的回答:

It's stil awfully unclear what you are trying to accomplish. However, here are two concepts that should be able to save you a world of headaches.

你想要完成的是什么还不清楚。但是,这里有两个概念可以为您省去一个头痛的世界。

First, you do not need your ids function. You can get more mileage out of expand.grid:

myIDs <- expand.grid(unique(Do), unique(Cl), unique(In), unique(Sa))

# You can then use something like 
apply(myIDs, 1, paste, sep=".")
# to get the same results.  Or whatever other function suits

However, even that is not neccessary.

然而,即使那样也不是必要的。


Here is the equivalent of your dG function using data.table.

Notice there is no need for any of the splitting or ids or anything like that.
Everything is hanlded by the by function in data.table.

注意,不需要任何分裂或id之类的东西。所有的东西都是由data.table中的by函数来完成的。

library(data.table)
myDT <- data.table(DataFrame)

myDT

dG_DT <- 
    100 * 
    1 - 10^(   myDT[ ,     Data, by=list(Do, Cl, In, Sa)][, Data] 
             - myDT[Do==0, Data, by=list(Do, Cl, In, Sa)][, Data]
            ) / 

    1 - 10^(   myDT[Do==100, Data, by=list(Do, Cl, In, Sa)][, Data]
             - myDT[Do==0,   Data, by=list(Do, Cl, In, Sa)][, Data]
            ) - 
    myDT[, Do]

dG_DT