dplyr加入警告:加入不同级别的因素

时间:2022-03-21 14:57:40

When using the join function in the dplyr package, I get this warning:

在dplyr包中使用join函数时,我收到此警告:

Warning message:
In left_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector

There is not a lot of information online about this. Any idea what it could be? Thanks!

网上没有很多关于此的信息。知道它可能是什么?谢谢!

2 个解决方案

#1


28  

That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to lose any information, the factors were converted to character values. For example:

这不是错误,这是一个警告。而且它告诉你,你在连接中使用的一个列是一个因素,并且该因子在不同的数据集中具有不同的级别。为了不丢失任何信息,将因子转换为字符值。例如:

library(dplyr)
x<-data.frame(a=letters[1:7])
y<-data.frame(a=letters[4:10])

class(x$a) 
# [1] "factor"

# NOTE these are different
levels(x$a)
# [1] "a" "b" "c" "d" "e" "f" "g"
levels(y$a)
# [1] "d" "e" "f" "g" "h" "i" "j"

m <- left_join(x,y)
# Joining by: "a"
# Warning message:
# joining factors with different levels, coercing to character vector 

class(m$a)
# [1] "character"

You can make sure that both factors have the same levels before merging

在合并之前,您可以确保两个因素具有相同的级别

combined <- sort(union(levels(x$a), levels(y$a)))
n <- left_join(mutate(x, a=factor(a, levels=combined)),
    mutate(y, a=factor(a, levels=combined)))
# Joining by: "a"
class(n$a)
#[1] "factor"

#2


1  

Moreover, if the grouping columns in the two joining table, have different level orders, this warning meassage will also come.

而且,如果两个连接表中的分组列具有不同的级别顺序,则此警告用量也会出现。

> tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a))
> tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c"))
> # change table tb2's col a level order
> tb1$a %>% class()
[1] "factor"
> tb2$a %>% class()
[1] "factor"
> tb1$a %>% levels()
[1] "a" "b" "c"
> tb2$a %>% levels()
[1] "c" "a" "b"
> tb1 %>% left_join(tb2)
Joining, by = "a"
Column `a` joining factors with different levels, coercing to character vector

#1


28  

That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to lose any information, the factors were converted to character values. For example:

这不是错误,这是一个警告。而且它告诉你,你在连接中使用的一个列是一个因素,并且该因子在不同的数据集中具有不同的级别。为了不丢失任何信息,将因子转换为字符值。例如:

library(dplyr)
x<-data.frame(a=letters[1:7])
y<-data.frame(a=letters[4:10])

class(x$a) 
# [1] "factor"

# NOTE these are different
levels(x$a)
# [1] "a" "b" "c" "d" "e" "f" "g"
levels(y$a)
# [1] "d" "e" "f" "g" "h" "i" "j"

m <- left_join(x,y)
# Joining by: "a"
# Warning message:
# joining factors with different levels, coercing to character vector 

class(m$a)
# [1] "character"

You can make sure that both factors have the same levels before merging

在合并之前,您可以确保两个因素具有相同的级别

combined <- sort(union(levels(x$a), levels(y$a)))
n <- left_join(mutate(x, a=factor(a, levels=combined)),
    mutate(y, a=factor(a, levels=combined)))
# Joining by: "a"
class(n$a)
#[1] "factor"

#2


1  

Moreover, if the grouping columns in the two joining table, have different level orders, this warning meassage will also come.

而且,如果两个连接表中的分组列具有不同的级别顺序,则此警告用量也会出现。

> tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a))
> tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c"))
> # change table tb2's col a level order
> tb1$a %>% class()
[1] "factor"
> tb2$a %>% class()
[1] "factor"
> tb1$a %>% levels()
[1] "a" "b" "c"
> tb2$a %>% levels()
[1] "c" "a" "b"
> tb1 %>% left_join(tb2)
Joining, by = "a"
Column `a` joining factors with different levels, coercing to character vector