如何在保留小数的同时将整个数据帧转换为数字?

时间:2022-09-27 22:56:25

I have a mixed class dataframe (numeric and factor) where I am trying to convert the entire data frame to numeric. The following illustrates the type of data I am working with as well as the problem I am encountering:

我有一个混合类数据帧(数字和因子),我试图将整个数据帧转换为数字。以下说明了我正在使用的数据类型以及我遇到的问题:

> a = as.factor(c(0.01,0.02,0.03,0.04))
> b = c(2,4,5,7)
> df1 = data.frame(a,b)
> class(df1$a)
[1] "factor"
> class(df1$b)
[1] "numeric"

When I try and convert the entire data frame to numeric, it alters the numeric values. For example:

当我尝试将整个数据框转换为数字时,它会改变数值。例如:

> df2 = as.data.frame(sapply(df1, as.numeric))
> class(df2$a)
[1] "numeric"
> df2
  a b
1 1 2
2 2 4
3 3 5
4 4 7

Previous posts on this site suggest using as.numeric(as.character(df1$a)), which works great for one column. However, I need to apply this approach to a dataframe that may contain hundreds of columns.

此站点上的先前帖子建议使用as.numeric(as.character(df1 $ a)),这对一列非常有用。但是,我需要将此方法应用于可能包含数百列的数据框。

What are my options for converting an entire dataframe from factor to numeric, while preserving the numeric decimal values?

将整个数据帧从因子转换为数字,同时保留数字十进制值,我有哪些选择?

The following is the output I would like to produce where a and b are numeric:

以下是我想要生成的输出,其中a和b是数字:

     a b
1 0.01 2
2 0.02 4
3 0.03 5
4 0.04 7

I have read the following related posts, although none of them apply directly to this case:

我已阅读以下相关帖子,但它们都不直接适用于此案例:

  1. How to convert a factor variable to numeric while preserving the numbers in R This references a single column in a dataframe.
  2. 如何在保留R中的数字的同时将因子变量转换为数字这将引用数据框中的单个列。
  3. converting from a character to a numeric data frame. This post does not take into account decimal values.
  4. 从字符转换为数字数据框。这篇文章没有考虑小数值。
  5. How can i convert a factor column that contains decimal numbers to numeric?. This applies to only one column in a data frame.
  6. 如何将包含十进制数的因子列转换为数字?这仅适用于数据框中的一列。

4 个解决方案

#1


10  

You might need to do some checking. You cannot convert factors straight to numeric. as.character must be applied first. Otherwise the factors will be converted to their numeric storage values. I would check each column with is.factor then coerce to numeric as necessary.

您可能需要进行一些检查。您无法将因子直接转换为数字。必须首先应用as.character。否则,因子将转换为其数字存储值。我会用is.factor检查每一列,然后根据需要强制转换为数字。

df1[] <- lapply(df1, function(x) {
    if(is.factor(x)) as.numeric(as.character(x)) else x
})
sapply(df1, class)
#         a         b 
# "numeric" "numeric" 

#2


3  

Using dplyr (a bit like sapply..)

使用dplyr(有点像sapply ..)

df2 <- mutate_all(df1, function(x) as.numeric(as.character(x)))

which gives:

这使:

glimpse(df2)
Observations: 4
Variables: 2
$ a <dbl> 0.01, 0.02, 0.03, 0.04
$ b <dbl> 2, 4, 5, 7

from your df1 which was:

来自你的df1:

glimpse(df1)
Observations: 4
Variables: 2
$ a <fctr> 0.01, 0.02, 0.03, 0.04
$ b <dbl> 2, 4, 5, 7

#3


1  

> df2 <- data.frame(sapply(df1, function(x) as.numeric(as.character(x))))
> df2
     a b
1 0.01 2
2 0.02 4
3 0.03 5
4 0.04 7
> sapply(df2, class)
        a         b 
"numeric" "numeric" 

#4


1  

df2 <- data.frame(apply(df1, 2, function(x) as.numeric(as.character(x))))

#1


10  

You might need to do some checking. You cannot convert factors straight to numeric. as.character must be applied first. Otherwise the factors will be converted to their numeric storage values. I would check each column with is.factor then coerce to numeric as necessary.

您可能需要进行一些检查。您无法将因子直接转换为数字。必须首先应用as.character。否则,因子将转换为其数字存储值。我会用is.factor检查每一列,然后根据需要强制转换为数字。

df1[] <- lapply(df1, function(x) {
    if(is.factor(x)) as.numeric(as.character(x)) else x
})
sapply(df1, class)
#         a         b 
# "numeric" "numeric" 

#2


3  

Using dplyr (a bit like sapply..)

使用dplyr(有点像sapply ..)

df2 <- mutate_all(df1, function(x) as.numeric(as.character(x)))

which gives:

这使:

glimpse(df2)
Observations: 4
Variables: 2
$ a <dbl> 0.01, 0.02, 0.03, 0.04
$ b <dbl> 2, 4, 5, 7

from your df1 which was:

来自你的df1:

glimpse(df1)
Observations: 4
Variables: 2
$ a <fctr> 0.01, 0.02, 0.03, 0.04
$ b <dbl> 2, 4, 5, 7

#3


1  

> df2 <- data.frame(sapply(df1, function(x) as.numeric(as.character(x))))
> df2
     a b
1 0.01 2
2 0.02 4
3 0.03 5
4 0.04 7
> sapply(df2, class)
        a         b 
"numeric" "numeric" 

#4


1  

df2 <- data.frame(apply(df1, 2, function(x) as.numeric(as.character(x))))