I have a data frame df
that looks something like this:
我有一个数据框df看起来像这样:
Date Company MarketCap
2000-01-31 Company one 1000
2000-02-28 Company one 2000
2000-03-31 Company one 3000
2000-01-31 Company two 2500
2000-02-28 Company two 3000
2000-03-31 Company two 3500
2000-01-31 Company three 1500
2000-02-28 Company three 1800
2000-03-31 Company three 1100
I need an if-statement that does the following:
我需要一个执行以下操作的if语句:
If(df$MarketCap >= median(df$MarketCap){
BigCap <- df[all the rows that have a market cap >= median(df$MarketCap)
}
Put in words; For each row of df$MarketCap
, I want to check if the market caps are greater than or equal to the median market cap of df$MarketCap
. All rows containing market caps greater than or equal to the median market cap of df$MarketCap
should make up a new data frame, BigCap
.
用语言说出来;对于df $ MarketCap的每一行,我想检查市值是否大于或等于df $ MarketCap的中位数市值。包含市值大于或等于df $ MarketCap中位数市值的所有行应构成一个新的数据框BigCap。
The new data frame BigCap
should thus be like this:
因此,新数据框BigCap应如下所示:
BigCap
:
Date Company MarketCap
2000-02-28 Company one 2000
2000-03-31 Company one 3000
2000-01-31 Company two 2500
2000-02-28 Company two 3000
2000-03-31 Company two 3500
I feel like this should be easy to acheive using an if-statement, but I haven't had any success this far (not by looking at similar questions at SO either). I appreciate all the help I can get.
我觉得这应该很容易使用if语句来实现,但到目前为止我还没有取得任何成功(也不是在SO处查看类似的问题)。我很感激能得到的所有帮助。
Note, my real df is a lot larger than the example provided here, where I have 360 dates and over 2000 companies.
请注意,我的真实df比这里提供的示例大很多,我有360个日期和超过2000家公司。
2 个解决方案
#1
2
I like CPak's answer but if you need the separate data.frames
, this works:
我喜欢CPak的答案,但是如果你需要单独的data.frames,这可行:
df <- data.frame(date = rep(Sys.Date() - c(60,30,0), 3), comp = rep(1:3, each = 3),
cap = c(1000, 2000, 3000, 2500, 3000, 3500, 1500, 1800, 1100))
for (i in unique(as.character(df$date))) {
med <- median(df$cap[df$date == i])
assign(paste0("smallCap", format(as.Date(i), "%b")),
df[df$date == i & df$cap < med, ])
assign(paste0("bigCap", format(as.Date(i), "%b")),
df[df$date == i & df$cap >= med, ])
}
EDIT: in comments, OP asked for a data frame for a specific month.
编辑:在评论中,OP询问了特定月份的数据框架。
For a given month in a specific year, say Oct 2017:
对于特定年份的特定月份,例如2017年10月:
# first calculate median
med <- median(df$cap[format(df$date, "%Y-%m") == "2017-10"])
# subset df
BigCapOct <- df[format(df$date, "%Y-%m") == "2017-10" & df$cap >= med, ]
For the month of October across all years:
所有年份的十月份:
med <- median(df$cap[format(df$date, "%m") == "10"])
BigCapOct <- df[format(df$date, "%m") == "10" & df$cap >= med, ]
#2
2
I created SmallCap
and LargeCap
, which is a list of data.frames that contain either observations that are < median(MarketCap)
or >= median(MarketCap)
. Each entry of the list is a separate Date.
我创建了SmallCap和LargeCap,它是一个data.frames列表,其中包含
library(dplyr)
SmallCap <- df %>%
group_by(Date) %>%
filter(MarketCap < median(MarketCap)) %>%
split(.$Date)
# $`1`
# # A tibble: 1 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-01-31 Company_one 1000
# $`2`
# # A tibble: 1 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-02-28 Company_three 1800
# $`3`
# # A tibble: 1 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-03-31 Company_three 1100
LargeCap <- df %>%
group_by(Date) %>%
filter(MarketCap >= median(MarketCap)) %>%
split(.$Date)
# $`2000-01-31`
# # A tibble: 2 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-01-31 Company_two 2500
# 2 2000-01-31 Company_three 1500
# $`2000-02-28`
# # A tibble: 2 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-02-28 Company_one 2000
# 2 2000-02-28 Company_two 3000
# $`2000-03-31`
# # A tibble: 2 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-03-31 Company_one 3000
# 2 2000-03-31 Company_two 3500
#1
2
I like CPak's answer but if you need the separate data.frames
, this works:
我喜欢CPak的答案,但是如果你需要单独的data.frames,这可行:
df <- data.frame(date = rep(Sys.Date() - c(60,30,0), 3), comp = rep(1:3, each = 3),
cap = c(1000, 2000, 3000, 2500, 3000, 3500, 1500, 1800, 1100))
for (i in unique(as.character(df$date))) {
med <- median(df$cap[df$date == i])
assign(paste0("smallCap", format(as.Date(i), "%b")),
df[df$date == i & df$cap < med, ])
assign(paste0("bigCap", format(as.Date(i), "%b")),
df[df$date == i & df$cap >= med, ])
}
EDIT: in comments, OP asked for a data frame for a specific month.
编辑:在评论中,OP询问了特定月份的数据框架。
For a given month in a specific year, say Oct 2017:
对于特定年份的特定月份,例如2017年10月:
# first calculate median
med <- median(df$cap[format(df$date, "%Y-%m") == "2017-10"])
# subset df
BigCapOct <- df[format(df$date, "%Y-%m") == "2017-10" & df$cap >= med, ]
For the month of October across all years:
所有年份的十月份:
med <- median(df$cap[format(df$date, "%m") == "10"])
BigCapOct <- df[format(df$date, "%m") == "10" & df$cap >= med, ]
#2
2
I created SmallCap
and LargeCap
, which is a list of data.frames that contain either observations that are < median(MarketCap)
or >= median(MarketCap)
. Each entry of the list is a separate Date.
我创建了SmallCap和LargeCap,它是一个data.frames列表,其中包含
library(dplyr)
SmallCap <- df %>%
group_by(Date) %>%
filter(MarketCap < median(MarketCap)) %>%
split(.$Date)
# $`1`
# # A tibble: 1 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-01-31 Company_one 1000
# $`2`
# # A tibble: 1 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-02-28 Company_three 1800
# $`3`
# # A tibble: 1 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-03-31 Company_three 1100
LargeCap <- df %>%
group_by(Date) %>%
filter(MarketCap >= median(MarketCap)) %>%
split(.$Date)
# $`2000-01-31`
# # A tibble: 2 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-01-31 Company_two 2500
# 2 2000-01-31 Company_three 1500
# $`2000-02-28`
# # A tibble: 2 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-02-28 Company_one 2000
# 2 2000-02-28 Company_two 3000
# $`2000-03-31`
# # A tibble: 2 x 3
# # Groups: Date [1]
# Date Company MarketCap
# <fctr> <fctr> <int>
# 1 2000-03-31 Company_one 3000
# 2 2000-03-31 Company_two 3500