There seems to be a difference between levels and labels of a factor in R. Up to now, I always thought that levels were the 'real' name of factor levels, and labels were the names used for output (such as tables and plots). Obviously, this is not the case, as the following example shows:
在r到目前为止,一个因素的级别和标签之间似乎存在着差异,我一直认为,层次是因素级别的“真实”名称,而标签则是用于输出的名称(如表和图)。显然,事实并非如此,如下示例所示:
df <- data.frame(v=c(1,2,3),f=c('a','b','c'))
str(df)
'data.frame': 3 obs. of 2 variables:
$ v: num 1 2 3
$ f: Factor w/ 3 levels "a","b","c": 1 2 3
df$f <- factor(df$f, levels=c('a','b','c'),
labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
levels(df$f)
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"
I thought that the levels ('a','b','c') could somehow still be accessed when scripting, but this doesn't work:
我认为当脚本编写时,仍然可以访问级别('a','b','c'),但这不起作用:
> df$f=='a'
[1] FALSE FALSE FALSE
But this does:
但这样做:
> df$f=='Treatment A: XYZ'
[1] TRUE FALSE FALSE
So, my question consists of two parts:
所以,我的问题包括两个部分:
-
What's the difference between levels and labels?
等级和标签有什么区别?
-
Is it possible to have different names for factor levels for scripting and output?
对于脚本和输出,是否可能有不同的元素名称?
Background: For longer scripts, scripting with short factor levels seems to be much easier. However, for reports and plots, this short factor levels may not be adequate and should be replaced with preciser names.
背景:对于较长的脚本,使用短因素级别的脚本似乎要容易得多。然而,对于报告和情节,这个短的因素水平可能不够,应该用精确的名称代替。
2 个解决方案
#1
94
Very short : levels are the input, labels are the output in the factor()
function. A factor has only a level
attribute, which is set by the labels
argument in the factor()
function. This is different from the concept of labels in statistical packages like SPSS, and can be confusing in the beginning.
非常短:级别是输入,标签是factor()函数中的输出。一个因子只有一个level属性,它是由factor()函数中的标签参数设置的。这与SPSS等统计软件包中的标签概念不同,在开始时可能会混淆。
What you do in this line of code
你在这行代码中所做的事情!
df$f <- factor(df$f, levels=c('a','b','c'),
labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
is telling to R that there is a vector df$f
告诉R有一个向量df$f ?
- which you want to transform into a factor,
- 你想转化成一个因子,
- in which the different levels are coded as a, b, and c
- 在其中,不同的级别被编码为a、b和c。
- and for which you want the levels to be labeled as Treatment A etc.
- 你希望这些级别被标记为治疗A等。
The factor function will look for the values a, b and c, convert them to numerical factor classes, and add the label values to the level
attribute of the factor. This attribute is used to convert the internal numerical values to the correct labels. But as you see, there is no label
attribute.
factor函数将查找a、b和c的值,将它们转换为数值因子类,并将标签值添加到factor的level属性中。此属性用于将内部数值转换为正确的标签。但正如您看到的,没有label属性。
> df <- data.frame(v=c(1,2,3),f=c('a','b','c'))
> attributes(df$f)
$levels
[1] "a" "b" "c"
$class
[1] "factor"
> df$f <- factor(df$f, levels=c('a','b','c'),
+ labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
> attributes(df$f)
$levels
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"
$class
[1] "factor"
#2
7
I wrote a package "lfactors" that allows you to refer to either levels or labels.
我编写了一个包“lfactors”,允许您引用级别或标签。
# packages
install.packages("lfactors")
require(lfactors)
flips <- lfactor(c(0,1,1,0,0,1), levels=0:1, labels=c("Tails", "Heads"))
# Tails can now be referred to as, "Tails" or 0
# These two lines return the same result
flips == "Tails"
#[1] TRUE FALSE FALSE TRUE TRUE FALSE
flips == 0
#[1] TRUE FALSE FALSE TRUE TRUE FALSE
Note that an lfactor requires that the levels be numeric so that they cannot be confused with the labels.
请注意,lfactor要求级别为数值,以便它们不能与标签混淆。
#1
94
Very short : levels are the input, labels are the output in the factor()
function. A factor has only a level
attribute, which is set by the labels
argument in the factor()
function. This is different from the concept of labels in statistical packages like SPSS, and can be confusing in the beginning.
非常短:级别是输入,标签是factor()函数中的输出。一个因子只有一个level属性,它是由factor()函数中的标签参数设置的。这与SPSS等统计软件包中的标签概念不同,在开始时可能会混淆。
What you do in this line of code
你在这行代码中所做的事情!
df$f <- factor(df$f, levels=c('a','b','c'),
labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
is telling to R that there is a vector df$f
告诉R有一个向量df$f ?
- which you want to transform into a factor,
- 你想转化成一个因子,
- in which the different levels are coded as a, b, and c
- 在其中,不同的级别被编码为a、b和c。
- and for which you want the levels to be labeled as Treatment A etc.
- 你希望这些级别被标记为治疗A等。
The factor function will look for the values a, b and c, convert them to numerical factor classes, and add the label values to the level
attribute of the factor. This attribute is used to convert the internal numerical values to the correct labels. But as you see, there is no label
attribute.
factor函数将查找a、b和c的值,将它们转换为数值因子类,并将标签值添加到factor的level属性中。此属性用于将内部数值转换为正确的标签。但正如您看到的,没有label属性。
> df <- data.frame(v=c(1,2,3),f=c('a','b','c'))
> attributes(df$f)
$levels
[1] "a" "b" "c"
$class
[1] "factor"
> df$f <- factor(df$f, levels=c('a','b','c'),
+ labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
> attributes(df$f)
$levels
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"
$class
[1] "factor"
#2
7
I wrote a package "lfactors" that allows you to refer to either levels or labels.
我编写了一个包“lfactors”,允许您引用级别或标签。
# packages
install.packages("lfactors")
require(lfactors)
flips <- lfactor(c(0,1,1,0,0,1), levels=0:1, labels=c("Tails", "Heads"))
# Tails can now be referred to as, "Tails" or 0
# These two lines return the same result
flips == "Tails"
#[1] TRUE FALSE FALSE TRUE TRUE FALSE
flips == 0
#[1] TRUE FALSE FALSE TRUE TRUE FALSE
Note that an lfactor requires that the levels be numeric so that they cannot be confused with the labels.
请注意,lfactor要求级别为数值,以便它们不能与标签混淆。