如何汇总数据并创建新列?

时间:2021-01-12 13:13:40

I am having troubling summarizing my data the way I want it. I was wondering if someone could point out where I was going wrong. Below is a subset of my data. It came from the General Social Survey and the dimensions of my data set were 2x33500

我正在按照我想要的方式总结我的数据。我想知道是否有人可以指出我哪里出错了。下面是我的数据的子集。它来自一般社会调查,我的数据集的维度是2x33500

              class  owngun
32997  Middle Class      No
8246  Working Class      No
13613  Middle Class     Yes
31553  Middle Class      No
31316 Working Class      No
20083  Middle Class     Yes
26289  Middle Class      No
29363  Middle Class      No
25821 Working Class Refused
4996   Middle Class     Yes
14641  Middle Class     Yes
15523  Middle Class     Yes
27361 Working Class     Yes
29035 Working Class     Yes
25330  Middle Class      No
16424   Lower Class     Yes
17535 Working Class      No
2841  Working Class      No
18465  Middle Class      No
16629  Middle Class     Yes 

When I generate a table for my dataset:

当我为我的数据集生成一个表时:

               owngun
class            Yes   No Refused
  Lower Class    480 1254      21
  Working Class 6519 8752     142
  Middle Class  6216 8915     124
  Upper Class    391  678       7
  No Class         0    1       0

I like these values, but what I'm really interested in is the proportions of yes for each social class. How do I generate a new column of the proportions of yes for each social class?

我喜欢这些价值观,但我真正感兴趣的是每个社会阶层的比例。如何为每个社交类生成一个比例为yes的新列?

I have been trying to use dplyr to do this. Can anyone suggest a way to proceed?

我一直在尝试使用dplyr来做到这一点。谁能建议一种方法继续下去?

Thank you

3 个解决方案

#1


1  

You can create a new column using dplyr's mutate function. I am assuming the name of the dataframe you generated is called owngun. Therefore:

您可以使用dplyr的mutate函数创建一个新列。我假设您生成的数据帧的名称称为owngun。因此:

owngun = mutate(owngun, Yes_percent = Yes/(Yes + No + Refused))

#2


1  

Using the bit of data that you provided:

使用您提供的数据位:

table(df$class, df$owngun)/as.vector(table(df$class))

                       No   Refused       Yes
  Lower Class   0.0000000 0.0000000 1.0000000
  Middle Class  0.5000000 0.0000000 0.5000000
  Working Class 0.5714286 0.1428571 0.2857143

Data

### Your data
df = read.table(text="class  owngun
32997  'Middle Class'      No
8246  'Working Class'      No
13613  'Middle Class'     Yes
31553  'Middle Class'      No
31316 'Working Class'      No
20083  'Middle Class'     Yes
26289  'Middle Class'      No
29363  'Middle Class'      No
25821 'Working Class' Refused
4996  'Middle Class'     Yes
14641  'Middle Class'     Yes
15523  'Middle Class'     Yes
27361 'Working Class'     Yes
29035 'Working Class'     Yes
25330 'Middle Class'      No
16424  'Lower Class'     Yes
17535 'Working Class'      No
2841  'Working Class'      No
18465  'Middle Class'      No
16629  'Middle Class'     Yes",
header=TRUE)

#3


0  

This solution doesn't use dplyr but how about:

这个解决方案不使用dplyr但是如何:

tab <- table(df)
prop.table(tab, margin = 1)

#1


1  

You can create a new column using dplyr's mutate function. I am assuming the name of the dataframe you generated is called owngun. Therefore:

您可以使用dplyr的mutate函数创建一个新列。我假设您生成的数据帧的名称称为owngun。因此:

owngun = mutate(owngun, Yes_percent = Yes/(Yes + No + Refused))

#2


1  

Using the bit of data that you provided:

使用您提供的数据位:

table(df$class, df$owngun)/as.vector(table(df$class))

                       No   Refused       Yes
  Lower Class   0.0000000 0.0000000 1.0000000
  Middle Class  0.5000000 0.0000000 0.5000000
  Working Class 0.5714286 0.1428571 0.2857143

Data

### Your data
df = read.table(text="class  owngun
32997  'Middle Class'      No
8246  'Working Class'      No
13613  'Middle Class'     Yes
31553  'Middle Class'      No
31316 'Working Class'      No
20083  'Middle Class'     Yes
26289  'Middle Class'      No
29363  'Middle Class'      No
25821 'Working Class' Refused
4996  'Middle Class'     Yes
14641  'Middle Class'     Yes
15523  'Middle Class'     Yes
27361 'Working Class'     Yes
29035 'Working Class'     Yes
25330 'Middle Class'      No
16424  'Lower Class'     Yes
17535 'Working Class'      No
2841  'Working Class'      No
18465  'Middle Class'      No
16629  'Middle Class'     Yes",
header=TRUE)

#3


0  

This solution doesn't use dplyr but how about:

这个解决方案不使用dplyr但是如何:

tab <- table(df)
prop.table(tab, margin = 1)