有效地转换数据帧的多个列

时间:2021-10-13 15:45:53

I have a data frame, and I want to transform all columns (say, take the logs or whatever) with columns that match a certain name. So in the example below, I want to take the log of X.1 and X.2, but not Y or Z.1.

我有一个数据框,我想要使用与某个名称匹配的列来转换所有列(例如,记录日志或其他内容)。所以在下面的例子中,我想要记录X.1和X.2,但不是Y或Z.1。

df <- data.frame(
  Y = sample(0:1, 10, replace = TRUE),
  X.1 = sample(1:10),
  X.2 = sample(1:10),
  Z.1 = sample(151:160)
)

# option 1, won't work for dozens of fields
df$X.1 <- log(df$X.1)
df$X.2 <- log(df$X.2)

Is there a good, efficient way to do this when the dataframe is several gigabtyes?

当数据帧是几个gigabtyes时,有没有一种好的,有效的方法来做到这一点?

3 个解决方案

#1


22  

In the case of functions that will return a data.frame:

对于将返回data.frame的函数:

cols <- c("X.1","X.2")
df[cols] <- log(df[cols])

Otherwise you will need to use lapply or a loop over the columns. These solutions will be slower than the solution above, so only use them if you must.

否则,您将需要在列上使用lapply或循环。这些解决方案将比上述解决方案慢,因此只有在必要时才使用它们。

df[cols] <- lapply(df[cols], function(x) c(NA,diff(x)))
for(col in cols) {
  df[col] <- c(NA,diff(df[col]))
}

#2


6  

vars <- c("X.1", "X.2")

df[vars] <- lapply(df[vars], log)

#3


1  

df <- data.frame(
Y = sample(0:1, 10, replace = TRUE),
X.1 = sample(1:10),
X.2 = sample(1:10),
Z.1 = sample(151:160)
)
df

assuming that you know those variables which requires conversions in the real dataframe (2 and 3 refers to the 2nd and 3rd variables in df which are X.1 and X.2)

假设您知道那些需要在实际数据帧中进行转换的变量(2和3指的是df中的第2和第3个变量,即X.1和X.2)

df2=log10(df[c(2:3)])
df2

if the variables are far a part in the dataframe you can select them like c(1,3,6,8:10,13) for 1st, 3rd, 6th 8 through 10 and 13th.this works only for numerical variables.

如果变量是数据帧中的一部分,你可以选择它们,如第1,第3,第6,第8到第10和第13的c(1,3,6,8:10,13)。这只适用于数值变量。

#1


22  

In the case of functions that will return a data.frame:

对于将返回data.frame的函数:

cols <- c("X.1","X.2")
df[cols] <- log(df[cols])

Otherwise you will need to use lapply or a loop over the columns. These solutions will be slower than the solution above, so only use them if you must.

否则,您将需要在列上使用lapply或循环。这些解决方案将比上述解决方案慢,因此只有在必要时才使用它们。

df[cols] <- lapply(df[cols], function(x) c(NA,diff(x)))
for(col in cols) {
  df[col] <- c(NA,diff(df[col]))
}

#2


6  

vars <- c("X.1", "X.2")

df[vars] <- lapply(df[vars], log)

#3


1  

df <- data.frame(
Y = sample(0:1, 10, replace = TRUE),
X.1 = sample(1:10),
X.2 = sample(1:10),
Z.1 = sample(151:160)
)
df

assuming that you know those variables which requires conversions in the real dataframe (2 and 3 refers to the 2nd and 3rd variables in df which are X.1 and X.2)

假设您知道那些需要在实际数据帧中进行转换的变量(2和3指的是df中的第2和第3个变量,即X.1和X.2)

df2=log10(df[c(2:3)])
df2

if the variables are far a part in the dataframe you can select them like c(1,3,6,8:10,13) for 1st, 3rd, 6th 8 through 10 and 13th.this works only for numerical variables.

如果变量是数据帧中的一部分,你可以选择它们,如第1,第3,第6,第8到第10和第13的c(1,3,6,8:10,13)。这只适用于数值变量。