为因子变量做频率直方图。

时间:2022-08-05 14:55:34

I am very new to R, so I apologize for such a basic question. I spent an hour googling this issue, but couldn't find a solution.

我对R很陌生,所以我为这样一个基本的问题道歉。我花了一个小时搜索这个问题,但没有找到解决方案。

Say I have some categorical data in my data set about common pet types. I input it as a character vector in R that contains the names of different types of animals. I created it like this:

假设我的数据集中有一些关于常见pet类型的分类数据。我将它作为一个字符向量输入到R中,其中包含不同类型动物的名字。我是这样创造的:

animals <- c("cat", "dog",  "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat", "bird")

I turn it into a factor for use with other vectors in my data frame:

我将它转换成一个因子用于在我的数据帧中使用其他向量:

animalFactor <- as.factor(animals)

I now want to create a histogram that shows the frequency of each variable on the y-axis, the name of each factor on the x-axis, and contains one bar for each factor. I attempt this code:

现在我要创建一个直方图,显示y轴上每个变量的频率,x轴上每个因子的名称,每个因子包含一个bar。我尝试这段代码:

hist(table(animalFactor), freq=TRUE, xlab = levels(animalFactor), ylab = "Frequencies")

The output is absolutely nothing like I'd expect. Labeling problems aside, I can't seem to figure out how to create a simple frequency histogram by category.

产出绝对不像我预期的那样。除了标签问题之外,我似乎不知道如何按类别创建一个简单的频率直方图。

5 个解决方案

#1


54  

It seems like you want barplot(prop.table(table(animals))):

似乎你想要barplot(prop.table(table(animals)))):

为因子变量做频率直方图。

However, this is not a histogram.

然而,这不是直方图。

#2


14  

The reason you are getting the unexpected result is that hist(...) calculates the distribution from a numeric vector. In your code, table(animalFactor) behaves like a numeric vector with three elements: 1, 3, 7. So hist(...) plots the number of 1's (1), the number of 3's (1), and the number of 7's (1). @Roland's solution is the simplest.

您获得意外结果的原因是hist(…)计算来自数字向量的分布。在您的代码中,表(animalFactor)的行为就像一个包含三个元素的数字向量:1,3,7。所以hist(…)绘制了1的数(1),3的数(1)和7的数(1)。@Roland的解是最简单的。

Here's a way to do this using ggplot:

这里有一个使用ggplot的方法:

library(ggplot2)
ggp <- ggplot(data.frame(animals),aes(x=animals))
# counts
ggp + geom_histogram(fill="lightgreen")
# proportion
ggp + geom_histogram(fill="lightblue",aes(y=..count../sum(..count..)))

为因子变量做频率直方图。

You would get precisely the same result using animalFactor instead of animals in the code above.

在上面的代码中,您将使用animalFactor而不是动物来获得完全相同的结果。

#3


12  

If you'd like to do this in ggplot, an API change was made to geom_histogram() that leads to an error: https://github.com/hadley/ggplot2/issues/1465

如果您想在ggplot中进行此操作,将对geom_histogram()进行API更改,这会导致一个错误:https://github.com/hadley/ggplot2/issues/1465

To get around this, use geom_bar():

要解决这个问题,请使用geom_bar():

animals <- c("cat", "dog",  "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat", "bird")

library(ggplot2)
# counts
ggplot(data.frame(animals), aes(x=animals)) +
  geom_bar()

为因子变量做频率直方图。

#4


2  

Country is a categorical variable and I want to see how many occurences of country exist in the data set. In other words, how many records/attendees are from each Country

国家是一个分类变量,我想看看数据集中有多少国家发生,换句话说,每个国家有多少记录/与会者

barplot(summary(df$Country))

#5


1  

Data as factor can be used as input to the plot function.

数据作为因子可以作为输入的plot函数。

An answer to a similar question has been given here: https://stat.ethz.ch/pipermail/r-help/2010-December/261873.html

这里给出了一个类似问题的答案:https://stat.ethz.ch/pipermail/r-help/2010-December/261873.html

 x=sample(c("Richard", "Minnie", "Albert", "Helen", "Joe", "Kingston"),  
 50, replace=T)
 x=as.factor(x)
 plot(x)

#1


54  

It seems like you want barplot(prop.table(table(animals))):

似乎你想要barplot(prop.table(table(animals)))):

为因子变量做频率直方图。

However, this is not a histogram.

然而,这不是直方图。

#2


14  

The reason you are getting the unexpected result is that hist(...) calculates the distribution from a numeric vector. In your code, table(animalFactor) behaves like a numeric vector with three elements: 1, 3, 7. So hist(...) plots the number of 1's (1), the number of 3's (1), and the number of 7's (1). @Roland's solution is the simplest.

您获得意外结果的原因是hist(…)计算来自数字向量的分布。在您的代码中,表(animalFactor)的行为就像一个包含三个元素的数字向量:1,3,7。所以hist(…)绘制了1的数(1),3的数(1)和7的数(1)。@Roland的解是最简单的。

Here's a way to do this using ggplot:

这里有一个使用ggplot的方法:

library(ggplot2)
ggp <- ggplot(data.frame(animals),aes(x=animals))
# counts
ggp + geom_histogram(fill="lightgreen")
# proportion
ggp + geom_histogram(fill="lightblue",aes(y=..count../sum(..count..)))

为因子变量做频率直方图。

You would get precisely the same result using animalFactor instead of animals in the code above.

在上面的代码中,您将使用animalFactor而不是动物来获得完全相同的结果。

#3


12  

If you'd like to do this in ggplot, an API change was made to geom_histogram() that leads to an error: https://github.com/hadley/ggplot2/issues/1465

如果您想在ggplot中进行此操作,将对geom_histogram()进行API更改,这会导致一个错误:https://github.com/hadley/ggplot2/issues/1465

To get around this, use geom_bar():

要解决这个问题,请使用geom_bar():

animals <- c("cat", "dog",  "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat", "bird")

library(ggplot2)
# counts
ggplot(data.frame(animals), aes(x=animals)) +
  geom_bar()

为因子变量做频率直方图。

#4


2  

Country is a categorical variable and I want to see how many occurences of country exist in the data set. In other words, how many records/attendees are from each Country

国家是一个分类变量,我想看看数据集中有多少国家发生,换句话说,每个国家有多少记录/与会者

barplot(summary(df$Country))

#5


1  

Data as factor can be used as input to the plot function.

数据作为因子可以作为输入的plot函数。

An answer to a similar question has been given here: https://stat.ethz.ch/pipermail/r-help/2010-December/261873.html

这里给出了一个类似问题的答案:https://stat.ethz.ch/pipermail/r-help/2010-December/261873.html

 x=sample(c("Richard", "Minnie", "Albert", "Helen", "Joe", "Kingston"),  
 50, replace=T)
 x=as.factor(x)
 plot(x)