I know how to draw histograms or other frequency/percentage related tables. But now I want to know, how can I get those frequency values in a table to use after the fact.
我知道如何绘制直方图或其他频率/百分比相关的表。但是现在我想知道,如何在一个表格中得到这些频率值。
I have a massive dataset, now I draw a histogram with a set binwidth. I want to extract the frequency value (i.e. value on y-axis) that corresponds to each binwidth and save it somewhere.
我有一个庞大的数据集,现在我画了一个带有一个集合binwidth的直方图。我想要提取对应于每个binwidth的频率值(即y轴上的值),并将其保存到某个地方。
Can someone please help me with this? Thank you!
有人能帮我吗?谢谢你!
3 个解决方案
#1
37
The hist
function has a return value (an object of class histogram
):
hist函数具有返回值(类直方图的对象):
R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1 0 1 2 3 4
$counts
[1] 1 2 17 27 34 16 2 1
$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01
$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01
$mids
[1] -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
$xname
[1] "rnorm(100)"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
#2
19
From ?hist
: Value
从?嘘:价值
an object of class "histogram" which is a list with components:
一个类“直方图”的对象,它是一个包含组件的列表:
- breaks the n+1 cell boundaries (= breaks if that was a vector). These are the nominal breaks, not with the boundary fuzz.
- 打破n+1细胞边界(如果这是一个向量的话)。这些是名义上的休息,而不是边界模糊。
- counts n integers; for each cell, the number of x[] inside.
- 数n个整数;对于每个单元格,内部的x[]数。
- density values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
- 密度值f ^(x[我]),作为估计的密度值。如果全部(diff(break) == 1),它们是相对频率计数/n,总体上满足[i;f ^(x[我])(b - b(i + 1)[我]))= 1,b[我]=[我]。
- intensities same as density. Deprecated, but retained for compatibility.
- 强度与密度相同。弃用,但为了兼容性而保留。
- mids the n cell midpoints.
- 将n个细胞的中点。
- xname a character string with the actual x argument name.
- 用实际的x参数名称命名一个字符串。
- equidist logical, indicating if the distances between breaks are all the same.
- equidist逻辑,表示中断之间的距离是相同的。
breaks
and density
provide just about all you need:
破碎和密度提供了你所需要的一切:
histrv<-hist(x)
histrv$breaks
histrv$density
#3
2
Just in case someone hits this question with ggplot
's geom_histogram
in mind, note that there is a way to extract the data from a ggplot object.
如果有人用ggplot的地形图来回答这个问题,请注意,有一种方法可以从ggplot对象中提取数据。
The following convenience function outputs a dataframe with the lower limit of each bin (xmin
), the upper limit of each bin (xmax
), the mid-point of each bin (x
), as well as the frequency value (y
).
下面的便利函数输出每个bin (xmin)的下限,每个bin (xmax)的上限,每个bin (x)的中点,以及频率值(y)。
## Convenience function
get_hist <- function(p) {
d <- ggplot_build(p)$data[[1]]
data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}
# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)
# make geom_histogram
p <- ggplot(data = df, aes(x = x)) +
geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
color = "black", fill = "white")
Illustration:
说明:
hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1] 7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5
A related question I answered here (Cumulative histogram with ggplot2).
我在这里回答了一个相关的问题(累积直方图和ggplot2)。
#1
37
The hist
function has a return value (an object of class histogram
):
hist函数具有返回值(类直方图的对象):
R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1 0 1 2 3 4
$counts
[1] 1 2 17 27 34 16 2 1
$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01
$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01
$mids
[1] -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
$xname
[1] "rnorm(100)"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
#2
19
From ?hist
: Value
从?嘘:价值
an object of class "histogram" which is a list with components:
一个类“直方图”的对象,它是一个包含组件的列表:
- breaks the n+1 cell boundaries (= breaks if that was a vector). These are the nominal breaks, not with the boundary fuzz.
- 打破n+1细胞边界(如果这是一个向量的话)。这些是名义上的休息,而不是边界模糊。
- counts n integers; for each cell, the number of x[] inside.
- 数n个整数;对于每个单元格,内部的x[]数。
- density values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
- 密度值f ^(x[我]),作为估计的密度值。如果全部(diff(break) == 1),它们是相对频率计数/n,总体上满足[i;f ^(x[我])(b - b(i + 1)[我]))= 1,b[我]=[我]。
- intensities same as density. Deprecated, but retained for compatibility.
- 强度与密度相同。弃用,但为了兼容性而保留。
- mids the n cell midpoints.
- 将n个细胞的中点。
- xname a character string with the actual x argument name.
- 用实际的x参数名称命名一个字符串。
- equidist logical, indicating if the distances between breaks are all the same.
- equidist逻辑,表示中断之间的距离是相同的。
breaks
and density
provide just about all you need:
破碎和密度提供了你所需要的一切:
histrv<-hist(x)
histrv$breaks
histrv$density
#3
2
Just in case someone hits this question with ggplot
's geom_histogram
in mind, note that there is a way to extract the data from a ggplot object.
如果有人用ggplot的地形图来回答这个问题,请注意,有一种方法可以从ggplot对象中提取数据。
The following convenience function outputs a dataframe with the lower limit of each bin (xmin
), the upper limit of each bin (xmax
), the mid-point of each bin (x
), as well as the frequency value (y
).
下面的便利函数输出每个bin (xmin)的下限,每个bin (xmax)的上限,每个bin (x)的中点,以及频率值(y)。
## Convenience function
get_hist <- function(p) {
d <- ggplot_build(p)$data[[1]]
data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}
# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)
# make geom_histogram
p <- ggplot(data = df, aes(x = x)) +
geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
color = "black", fill = "white")
Illustration:
说明:
hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1] 7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5
A related question I answered here (Cumulative histogram with ggplot2).
我在这里回答了一个相关的问题(累积直方图和ggplot2)。