ggplot2 提取stat计算出来的数据

时间:2023-01-08 19:16:48

使用ggplot2 绘图时,我们只需要提供原始数据就可以了,ggplot2 内置了许多的计算函数,来帮助我们计算对应的数值。

最典型的的,当使用geom_boxplot 绘制箱线图时,我们只提供原始数据,用来绘图的最大值,最小值,中位数,上下四分位数都由ggplot2 自动计算。

那么我们如何提取这部分计算好的数据呢,以箱线图为例进行说明

绘图代码如下:

pdf("a.pdf")
p <- ggplot(mpg, aes(class, hwy)) + geom_boxplot()
temp <- print(p)
dev.off()

生成的图片如下:

ggplot2 提取stat计算出来的数据

在temp 这个对象中,就保存了计算好的用于绘制箱线图的数据

让我们来看下temp这个对象的结构

>str(temp)
List of 3
$ data :List of 1
..$ :'data.frame':7 obs. of 22 variables:
.. ..$ ymin : num [1:7] 23 23 23 21 15 20 14
.. ..$ lower : num [1:7] 24 26 26 22 16 24.5 17
.. ..$ middle : num [1:7] 25 27 27 23 17 26 17.5
.. ..$ upper : num [1:7] 26 29 29 24 18 30.5 19
.. ..$ ymax : num [1:7] 26 33 32 24 20 36 22
.. ..$ outliers :List of 7
.. .. ..$ : num(0)
.. .. ..$ : num [1:4] 35 37 35 44
.. .. ..$ : num(0)
.. .. ..$ : num 17
.. .. ..$ : num [1:4] 12 12 12 22
.. .. ..$ : num [1:2] 44 41
.. .. ..$ : num [1:8] 12 12 25 24 27 25 26 23
.. ..$ notchupper: num [1:7] 26.4 27.7 27.7 24 17.6 ...
.. ..$ notchlower: num [1:7] 23.6 26.3 26.3 22 16.4 ...
.. ..$ x : num [1:7] 1 2 3 4 5 6 7
.. ..$ PANEL : int [1:7] 1 1 1 1 1 1 1
.. ..$ group : int [1:7] 1 2 3 4 5 6 7
.. ..$ ymin_final: num [1:7] 23 23 23 17 12 20 12
.. ..$ ymax_final: num [1:7] 26 44 32 24 22 44 27
.. ..$ xmin : num [1:7] 0.625 1.625 2.625 3.625 4.625 ...
.. ..$ xmax : num [1:7] 1.38 2.38 3.38 4.38 5.38 ...
.. ..$ weight : num [1:7] 1 1 1 1 1 1 1
.. ..$ colour : chr [1:7] "grey20" "grey20" "grey20" "grey20" ...
.. ..$ fill : chr [1:7] "white" "white" "white" "white" ...
.. ..$ size : num [1:7] 0.5 0.5 0.5 0.5 0.5 0.5 0.5
.. ..$ alpha : logi [1:7] NA NA NA NA NA NA ...
.. ..$ shape : num [1:7] 19 19 19 19 19 19 19
.. ..$ linetype : chr [1:7] "solid" "solid" "solid" "solid" ...
$ layout:Classes 'Layout', 'ggproto' <ggproto object: Class Layout>
facet: <ggproto object: Class FacetNull, Facet>
compute_layout: function
draw_back: function
draw_front: function
draw_labels: function
draw_panels: function
finish_data: function
init_scales: function
map: function
map_data: function
params: list
render_back: function
render_front: function
render_panels: function
setup_data: function
setup_params: function
shrink: TRUE
train: function
train_positions: function
train_scales: function
vars: function
super: <ggproto object: Class FacetNull, Facet>
finish_data: function
get_scales: function
map: function
map_position: function
panel_layout: data.frame
panel_ranges: list
panel_scales: list
render: function
render_labels: function
reset_scales: function
setup: function
train_position: function
train_ranges: function
xlabel: function
ylabel: function
super: <ggproto object: Class Layout>
$ plot :List of 9
..$ data :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':234 obs. of 11 variables:
.. ..$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
.. ..$ model : chr [1:234] "a4" "a4" "a4" "a4" ...
.. ..$ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
.. ..$ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
.. ..$ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
.. ..$ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
.. ..$ drv : chr [1:234] "f" "f" "f" "f" ...
.. ..$ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
.. ..$ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
.. ..$ fl : chr [1:234] "p" "p" "p" "p" ...
.. ..$ class : chr [1:234] "compact" "compact" "compact" "compact" ...
..$ layers :List of 1
.. ..$ :Classes 'LayerInstance', 'Layer', 'ggproto' <ggproto object: Class LayerInstance, Layer>
aes_params: list
compute_aesthetics: function
compute_geom_1: function
compute_geom_2: function
compute_position: function
compute_statistic: function
data: waiver
draw_geom: function
finish_statistics: function
geom: <ggproto object: Class GeomBoxplot, Geom>
aesthetics: function
default_aes: uneval
draw_group: function
draw_key: function
draw_layer: function
draw_panel: function
extra_params: na.rm
handle_na: function
non_missing_aes:
optional_aes:
parameters: function
required_aes: x lower upper middle ymin ymax
setup_data: function
use_defaults: function
super: <ggproto object: Class Geom>
geom_params: list
inherit.aes: TRUE
layer_data: function
mapping: NULL
map_statistic: function
position: <ggproto object: Class PositionDodge, Position>
compute_layer: function
compute_panel: function
required_aes: x
setup_data: function
setup_params: function
width: NULL
super: <ggproto object: Class Position>
print: function
show.legend: NA
stat: <ggproto object: Class StatBoxplot, Stat>
aesthetics: function
compute_group: function
compute_layer: function
compute_panel: function
default_aes: uneval
extra_params: na.rm
finish_layer: function
non_missing_aes: weight
parameters: function
required_aes: x y
retransform: TRUE
setup_data: function
setup_params: function
super: <ggproto object: Class Stat>
stat_params: list
subset: NULL
super: <ggproto object: Class Layer>
..$ scales :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
add: function
clone: function
find: function
get_scales: function
has_scale: function
input: function
n: function
non_position_scales: function
scales: list
super: <ggproto object: Class ScalesList>
..$ mapping :List of 2
.. ..$ x: symbol class
.. ..$ y: symbol hwy
..$ theme : list()
..$ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
aspect: function
distance: function
expand: TRUE
is_linear: function
labels: function
limits: list
range: function
render_axis_h: function
render_axis_v: function
render_bg: function
render_fg: function
train: function
transform: function
super: <ggproto object: Class CoordCartesian, Coord>
..$ facet :Classes 'FacetNull', 'Facet', 'ggproto' <ggproto object: Class FacetNull, Facet>
compute_layout: function
draw_back: function
draw_front: function
draw_labels: function
draw_panels: function
finish_data: function
init_scales: function
map: function
map_data: function
params: list
render_back: function
render_front: function
render_panels: function
setup_data: function
setup_params: function
shrink: TRUE
train: function
train_positions: function
train_scales: function
vars: function
super: <ggproto object: Class FacetNull, Facet>
..$ plot_env :<environment: R_GlobalEnv>
..$ labels :List of 2
.. ..$ x: chr "class"
.. ..$ y: chr "hwy"
..- attr(*, "class")= chr [1:2] "gg" "ggplot"

 从运行结果可以看到,temp这个对象由3个元素构成的列表,第一个元素data 代表绘图用的数据,第二个元素layout 代表了页面的布局,第三个元素plot 代表了图片中的各种属性

其中data 就是我们想要的那部分数据

> temp$data[[1]]
ymin lower middle upper ymax outliers notchupper
1 23 24.0 25.0 26.0 26 26.41319
2 23 26.0 27.0 29.0 33 35, 37, 35, 44 27.69140
3 23 26.0 27.0 29.0 32 27.74026
4 21 22.0 23.0 24.0 24 17 23.95278
5 15 16.0 17.0 18.0 20 12, 12, 12, 22 17.55009
6 20 24.5 26.0 30.5 36 44, 41 27.60241
7 14 17.0 17.5 19.0 22 12, 12, 25, 24, 27, 25, 26, 23 17.90132
notchlower x PANEL group ymin_final ymax_final xmin xmax weight colour
1 23.58681 1 1 1 23 26 0.625 1.375 1 grey20
2 26.30860 2 1 2 23 44 1.625 2.375 1 grey20
3 26.25974 3 1 3 23 32 2.625 3.375 1 grey20
4 22.04722 4 1 4 17 24 3.625 4.375 1 grey20
5 16.44991 5 1 5 12 22 4.625 5.375 1 grey20
6 24.39759 6 1 6 20 44 5.625 6.375 1 grey20
7 17.09868 7 1 7 12 27 6.625 7.375 1 grey20
fill size alpha shape linetype
1 white 0.5 NA 19 solid
2 white 0.5 NA 19 solid
3 white 0.5 NA 19 solid
4 white 0.5 NA 19 solid
5 white 0.5 NA 19 solid
6 white 0.5 NA 19 solid
7 white 0.5 NA 19 solid

temp$data 是一个只有一个元素的列表,这个元素是一个数据框,记录了每个箱体的具体数据,共有7行,对应图片中的7个箱子,数据框的每列给出了对应的ymin, lower, middle, upper, ymax 等值。

对于每一种geom图层,都可以采用上面的方式来提取中间数据。