总结基于分类变量的数据分布的最佳方法是什么?我试着在一周的工作日打电话给你。

时间:2022-01-30 15:01:58

I have a dataframe that contains Date, Day-of-the-Week (categorical), and number of calls (numeric). I'm trying to do analytics on how what the distribution of call volume is by Day-of-the-week. Using the lattice package I was able to create a bar chart, but still need some additional help:

我有一个dataframe,它包含日期、星期(分类)和电话号码(数字)。我想要做的是,分析如何在一周的时间里,调用量的分布。使用格子包我可以创建一个条形图,但仍然需要一些额外的帮助:

  1. How can I order/sort the Day-of-the-Week variables that appear in my barchart (I'd like it to start on Sat, then Sun, then Mon,...)?

    我怎样才能排序/整理出我的图表中出现的一周的变量(我希望它从Sat开始,然后是Sun,然后是Mon,…)?

  2. I'd also like to transpose the distribution so that the bar charts are vertical as opposed to horizontal.

    我还想把这个分布转置,使条形图是垂直的而不是水平的。

  3. Finally, how can I add a box-and-wisker plot? Should I still use tapply for this?

    最后,我要如何添加一个box- wisker图?我还应该用tapply吗?

Thanks!

谢谢!

Here's what I've done so far:

以下是我迄今为止所做的:

LatinoDRTVdata <- read.csv("//dishfs1/Marketing/Mktg_Analytics/Team Member folders/Ryan_Chase/Ad Hoc/Latino DRTV Normalized Calls.csv")

#look at the first 10 rows
head(LatinoDRTVdata)

#look at the full dataset
LatinoDRTVdata

#look at the column names
colnames(LatinoDRTVdata)

#check the class of the Normalized.Latino.DRTV.call.volume column
class(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume)

##make  the call volume a numeric vector 
LatinoDRTVdata$Normalized.Latino.DRTV.call.volume <- as.numeric(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume)

#now check the class again
class(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume)
(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume)

#194 calls is the mean volume regardless of the day
mean(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume)

#Day of the week is a factor
class(LatinoDRTVdata$Day.of.the.Week)

summary(LatinoDRTVdata)

str(LatinoDRTVdata)

#histogram of daily Latino DRTV call volume
hist(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume)

#find the mean of each day
Daily.Latino.DRTV.Distribution<- tapply(LatinoDRTVdata$Normalized.Latino.DRTV.call.volume,LatinoDRTVdata$Day.of.the.Week,mean)

Daily.Latino.DRTV.Distribution

Daily.Latino.DRTV.Distribution$ormalized.Latino.DRTV.call.volume

##check that a new object has been added
ls()

str(Daily.Latino.DRTV.Distribution)

#make sure you install the lattice package for the graphics
#load the lattice package
library(lattice)
barchart(Daily.Latino.DRTV.Distribution)

here is the top 10 rows of my data:

以下是我数据的前10行:

> head(LatinoDRTVdata)
      Date Day.of.the.Week Normalized.Latino.DRTV.call.volume
1 3/1/2013          Friday                                384
2 3/2/2013        Saturday                                277
3 3/3/2013          Sunday                                178
4 3/4/2013          Monday                                400
5 3/5/2013         Tuesday                                410
6 3/6/2013       Wednesday                                404
> 

1 个解决方案

#1


0  

  1. Set the DotW factor order by redeclaring it.

    通过重新声明,设置DotW因子顺序。

    levels(LatinoDRTVdata$Day.of.the.Week) <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")`
    
  2. Use the ggplot package to make horizontal bars.

    使用ggplot包来制作水平条。

    library(ggplot2)
    ggplot(LatinoDRTVdata, aes(x=Day.of.the.Week, y=Normalized.Latino.DRTV.call.volume)) + geom_bar() + coord_flip()
    

    It doesn't look like the data you're using to graph is summarized into counts, but if it is, you'll need to add an argument geom_bar(stat='identity'). The horizontal bar occurs by the coord_flip() function.

    它看起来不像您用来图的数据被总结为计数,但如果是,您需要添加一个参数“风水”(stat='identity')。水平条是由coord_flip()函数产生的。

  3. A slight modification of 2.

    稍微修改一下2。

    ggplot(LatinoDRTVdata, aes(x=Day.of.the.Week, y=Normalized.Latino.DRTV.call.volume)) + geom_boxplot()
    

A good resource for ggplot: Cookbook for R

一个很好的素材:R的烹饪书。

#1


0  

  1. Set the DotW factor order by redeclaring it.

    通过重新声明,设置DotW因子顺序。

    levels(LatinoDRTVdata$Day.of.the.Week) <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")`
    
  2. Use the ggplot package to make horizontal bars.

    使用ggplot包来制作水平条。

    library(ggplot2)
    ggplot(LatinoDRTVdata, aes(x=Day.of.the.Week, y=Normalized.Latino.DRTV.call.volume)) + geom_bar() + coord_flip()
    

    It doesn't look like the data you're using to graph is summarized into counts, but if it is, you'll need to add an argument geom_bar(stat='identity'). The horizontal bar occurs by the coord_flip() function.

    它看起来不像您用来图的数据被总结为计数,但如果是,您需要添加一个参数“风水”(stat='identity')。水平条是由coord_flip()函数产生的。

  3. A slight modification of 2.

    稍微修改一下2。

    ggplot(LatinoDRTVdata, aes(x=Day.of.the.Week, y=Normalized.Latino.DRTV.call.volume)) + geom_boxplot()
    

A good resource for ggplot: Cookbook for R

一个很好的素材:R的烹饪书。