将单个类别的值与R中的所有(包括类别)进行比较

时间:2022-08-07 10:46:53

I am trying to use R to create a barchart to compare the frequency of a category to that of the entire dataset. I created some mock data which is similar to the real one and my expected output. My mock data includes three fruits (apple, orange, banana) with the equivalent eating frequency (1-2 times, 3-4 times, > 4 times). Mock data:

我正在尝试使用R创建一个条形图来比较类别的频率与整个数据集的频率。我创建了一些类似于真实数据和我预期输出的模拟数据。我的模拟数据包括三种水果(苹果,橙子,香蕉),具有相同的进食频率(1-2次,3-4次,> 4次)。模拟数据:

ID  Fruit   frequency
1   apple   1-2 times
2   apple   3-4 times
3   apple   1-2 times
4   apple   3-4 times
5   apple   1-2 times
6   apple   > 4 times
7   orange  3-4 times
8   orange  3-4 times
9   orange  1-2 times
10  orange  1-2 times
11  orange  1-2 times
12  banana  1-2 times
13  banana  3-4 times
14  banana  > 4 times
15  banana  > 4 times
16  banana  1-2 times
17  banana  3-4 times
18  banana  > 4 times
19  banana  1-2 times

The expected output is a bar chart with 3 groups of eating frequency (1-2 times, 3-4 times, > 4 times). With each of these groups, there will be two columns, one column represent "apple", the other column represent "the entire dataset".

预期输出是具有3组进食频率的条形图(1-2次,3-4次,> 4次)。对于这些组中的每一组,将有两列,一列代表“apple”,另一列代表“整个数据集”。

I could create the frequency barchart for the each category (like apple) but don't know how to add the entire dataset data for comparison.

我可以为每个类别(如苹果)创建频率条形图,但不知道如何添加整个数据集数据以进行比较。

Any suggestion which codes to use or which approach to take (subset "apple" maybe?) will be much appreciated!

任何代码使用的建议或采取哪种方法(子集“苹果”可能?)将非常感谢!

将单个类别的值与R中的所有(包括类别)进行比较

2 个解决方案

#1


0  

First I calculated both percentage (i.e. within fruits and in total) and then converted data into plot friendly format. Hope this helps!

首先,我计算了两个百分比(即在水果中和总数中),然后将数据转换为绘图友好格式。希望这可以帮助!

library(ggplot2)
library(dplyr)
library(tidyr)

df %>%
  group_by(fruit) %>%
  mutate(countF = n()) %>%
  group_by(freq, add=T) %>%
#frequency percentage within fruit
  mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
  group_by(freq) %>%
#frequency percentage in total
  mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
  select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
  gather(Percentage, value, -fruit, - freq) %>%
#plot
  ggplot(aes(x = freq, y=value, fill=Percentage)) + 
    geom_bar(position = "dodge", stat = "identity") +
    facet_grid(fruit ~ .) +
    geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)

Output plot is:

输出图是:

将单个类别的值与R中的所有(包括类别)进行比较

Sample data:

df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple", 
"apple", "apple", "apple", "orange", "orange", "orange", "orange", 
"orange", "banana", "banana", "banana", "banana", "banana", "banana", 
"banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times", 
"3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times", 
"1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times", 
"> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times", 
"1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA, 
-19L))

#2


0  

Here's a simple solution:

这是一个简单的解决方案:

data <- data.frame(
  fruit = sample(c("apple",'orange','banana'), size = 20, replace = TRUE),
  frequency =factor(sample(c("1-2 times", '3-4 times', '> 4 times'), size = 20, replace = TRUE), levels = c("1-2 times", '3-4 times', '> 4 times'))
)

apple.freq <- with(subset(data, fruit == "apple"), prop.table(table(frequency)))
overall.freq <- with(data, prop.table(table(frequency)))
freq.mat <- rbind(apple.freq, overall.freq)

barplot(freq.mat, beside = TRUE, col = c("red", "blue"))

将单个类别的值与R中的所有(包括类别)进行比较

You'll need to add the legend and axis labels and such, but that should get you started.

您需要添加图例和轴标签等,但这应该可以帮助您入门。

You can get a lot fancier using ggplot2 (a variation of this: Easily add an '(all)' facet to facet_wrap in ggplot2?, for example) but this is a simple solution in base R.

使用ggplot2可以获得更多的好处(例如:在ggplot2中轻松添加'(all)'facet到facet_wrap?)但这是基础R中的一个简单解决方案。

#1


0  

First I calculated both percentage (i.e. within fruits and in total) and then converted data into plot friendly format. Hope this helps!

首先,我计算了两个百分比(即在水果中和总数中),然后将数据转换为绘图友好格式。希望这可以帮助!

library(ggplot2)
library(dplyr)
library(tidyr)

df %>%
  group_by(fruit) %>%
  mutate(countF = n()) %>%
  group_by(freq, add=T) %>%
#frequency percentage within fruit
  mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
  group_by(freq) %>%
#frequency percentage in total
  mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
  select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
  gather(Percentage, value, -fruit, - freq) %>%
#plot
  ggplot(aes(x = freq, y=value, fill=Percentage)) + 
    geom_bar(position = "dodge", stat = "identity") +
    facet_grid(fruit ~ .) +
    geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)

Output plot is:

输出图是:

将单个类别的值与R中的所有(包括类别)进行比较

Sample data:

df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple", 
"apple", "apple", "apple", "orange", "orange", "orange", "orange", 
"orange", "banana", "banana", "banana", "banana", "banana", "banana", 
"banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times", 
"3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times", 
"1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times", 
"> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times", 
"1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA, 
-19L))

#2


0  

Here's a simple solution:

这是一个简单的解决方案:

data <- data.frame(
  fruit = sample(c("apple",'orange','banana'), size = 20, replace = TRUE),
  frequency =factor(sample(c("1-2 times", '3-4 times', '> 4 times'), size = 20, replace = TRUE), levels = c("1-2 times", '3-4 times', '> 4 times'))
)

apple.freq <- with(subset(data, fruit == "apple"), prop.table(table(frequency)))
overall.freq <- with(data, prop.table(table(frequency)))
freq.mat <- rbind(apple.freq, overall.freq)

barplot(freq.mat, beside = TRUE, col = c("red", "blue"))

将单个类别的值与R中的所有(包括类别)进行比较

You'll need to add the legend and axis labels and such, but that should get you started.

您需要添加图例和轴标签等,但这应该可以帮助您入门。

You can get a lot fancier using ggplot2 (a variation of this: Easily add an '(all)' facet to facet_wrap in ggplot2?, for example) but this is a simple solution in base R.

使用ggplot2可以获得更多的好处(例如:在ggplot2中轻松添加'(all)'facet到facet_wrap?)但这是基础R中的一个简单解决方案。