R tips: ggplot图层编写

生信菜鸟团

发布于 2023-09-08 12:26:24

2770

发布于 2023-09-08 12:26:24

文章被收录于专栏：生信菜鸟团

在实际使用中，ggplot中使用的图层是以geom或者stat开头的函数创建的，但是如果查看一下这些图层函数的具体内容可以发现他们都是在封装一个layer函数。

# 点图层
geom_point <- function(mapping = NULL, data = NULL,
                       stat = "identity", position = "identity",
                       ...,
                       na.rm = FALSE,
                       show.legend = NA,
                       inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomPoint,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      ...
    )
  )
}

# chull图层
function(mapping = NULL,
         data = NULL,
         geom = "polygon",
         position = "identity",
         na.rm = FALSE,
         show.legend = FALSE,
         inherit.aes = TRUE,
         fill = "WhiteSmoke",
         color = NA,
         ...) {
  layer(
    stat = StatChull,
    data = data,
    mapping = mapping,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      fill = fill,
      color = color,
      ...
    )
  )
}

现在细究一下layer函数，代码如下所示。

可以发现layer函数先处理了一个图层的geom和stat对象，然后解析出来各种美学和图层参数，最后返回一个ggproto对象用于描述图层的各种信息。

因此可做如下总结

一个图层的定义(geom或者stat开头的函数)必须同时包含一个geom和一个stat对象，通过layer函数将其串联在一起，返回一个ggproto对象。这个返回的ggproto对象是基于一个LAYER父类，这个LAYER是一个容器，里面有Geom和Stat对象。可以为默认的Stat和Geom，也可以是定制的Stat和Geom，比如GeomBoxplot、StatChull等对象。

layer <- function(geom = NULL, stat = NULL,
                  data = NULL, mapping = NULL,
                  position = NULL, params = list(),
                  inherit.aes = TRUE, check.aes = TRUE, check.param = TRUE,
                  show.legend = NA, key_glyph = NULL, layer_class = Layer) {
# ...部分代码省略...

  data <- fortify(data)

  geom <- check_subclass(geom, "Geom", env = parent.frame())
  stat <- check_subclass(stat, "Stat", env = parent.frame())
  position <- check_subclass(position, "Position", env = parent.frame())

# ...部分代码省略...

  # Split up params between aesthetics, geom, and stat
  params <- rename_aes(params)
  aes_params  <- params[intersect(names(params), geom$aesthetics())]
  geom_params <- params[intersect(names(params), geom$parameters(TRUE))]
  stat_params <- params[intersect(names(params), stat$parameters(TRUE))]

# ...部分代码省略...

  ggproto("LayerInstance", layer_class,
    geom = geom,
    geom_params = geom_params,
    stat = stat,
    stat_params = stat_params,
    data = data,
    mapping = mapping,
    aes_params = aes_params,
    position = position,
    inherit.aes = inherit.aes,
    show.legend = show.legend
  )
}

这里的ggproto函数的第二个参数代表一个ggproto类的父类，在这里是layer_class，它是layer函数的一个参数，其值是一个ggplot2预先定义的Layer类。

一个ggplot对象的渲染过程

一个ggplot2的渲染过程分为两步：

（1）ggplot_build函数将ggplot对象进行各种数据和坐标变换，生成一个ggplot_build对象；

（2）ggplot_gtable函数以ggplot_build对象为输入进行下一步的绘图对象生成的工作，最后返回一个gtable对象。

gtable就是一个可以直接渲染的图形对象了。

ggplot_build函数的具体处理过程如下：

layer: ggplot对象在绘制过程中，首先是使用Layer的setup_layer函数处理layer相关的数据处理，这里一般不处理数据，只是一个hook。
layout: 生成layout：layout调用facet的setup…, 调用coords的setup…
layer: layer的compute_aesthetics函数，在计算aes里面的变量，添加PANEL和使用add_group添加group信息
layout: layout相关调整：train_position and map_position
Stat: layer的compute_statistic，调用了Stat类的相关数据信息， setup_params setup_data compute_layer
layer: layer的map_statistic，应用aesthetic相关的eval操作，比如color、fill等解析为具体的值
Geom: layer的compute_geom1，调用了Geom子类的相关信息， setup_params setup_data
position: layer的compute_position，调用了position子类的相关信息 setup_params setup_data compute_layer
layout: 二次调整, reset_scales train_position and map_position
Geom: layer的compute_geom_2，调用了Geom子类的相关信息， use_defaults，主要是添加了默认美学参数
Stat: layer的finish_statistics，调用了Stat的finish_layer，默认为不处理 至此，一个ggplot_built对象完成创建。
Geom: 在ggplot_gtable的第一个环节就是调用layer的draw_geom函数，draw_geom调用每个layer的draw_layer函数完成grob对象创建、
其他从略

简单来说就是，一个图层的渲染是先生成一个layout布局，然后调用Stat的相关参数对数据进行变换，接着Geom使用变换后的数据进行绘制图形对象的数据准备工作。然后layout还会二次调整。最后Stat会有一个finish_layer函数在需要的情况下做进一步的数据变换。至此ggplot_built对此创建。

在ggplot_gtable的第一个环节就是调用Geom的图形对象绘制函数生成grob对象。

所以一个图层的Stat和Geom其实是分别用于数据变换和图形绘制。

举个例子说明一下这两个对象的作用，绘制boxplot的时候，我们传入的数据是完整的数据，但是一个boxplot图上的图形元素却不是传入的数据的值，而是经过统计处理的值，比如Q1、中位数、Q3、最大值、最小值及异常值等等。

那么把传入的数据进行统计计算就是Stat的作用，根据Stat统计后的数据进行图形绘制（也就是点、线、面对象的生成）是Geom的作用。

所以为何geom_point是geom开头呢，因为它的stat是一个默认Stat对象，也就是不做数据变换的意思，而它的Geom对象是重写的GeomPoint对象，重点是Geom的作用，因此冠以geom开头。

而stat_chull图层的stat是StatChull对象，geom是多边形图形的绘制对象，这里它主要是为了强调数据变换，所以冠以stat开头。

但是必须强调的是geom开头还是stat开头只是一种命名的推荐范式，并不是强制的规范。

一个图层的编写示例geom_whisker

ggplot的boxplot图形的两侧须线没有横杠，可以编写一个图层添加两侧的须线，具体区别如图下：

先定义一个图层的框架，由于须线位置需要和boxplot的须线位置对应，因此Stat继续使用geom_boxplot的StatBoxplot对象即可，但是Geom对象需要重新定义，使用自己定义的GeomWhisker对象。

library(tidyverse)

geom_whisker <-
  function (mapping = NULL,
            data = NULL,
            stat = "boxplot", # <--- stat保持默认 ---
            position = "dodge2",
            whisker_width = 0.2,
            ...,
            na.rm = FALSE,
            orientation = NA,
            show.legend = NA,
            inherit.aes = TRUE){ 
    if (is.character(position)) {
      position <- position_dodge2(preserve = "single")
    }

    layer(
      data = data,
      mapping = mapping,
      stat = stat,
      geom = GeomWhisker, # <--- geom使用自定义的GeomWhisker ---
      position = position,
      show.legend = show.legend,
      inherit.aes = inherit.aes,
      params = list(
        na.rm = na.rm,
        orientation = orientation,
        whisker_width = whisker_width,
        ...
      )
    )
  }

GeomWhisker对象的写法如下，绘制的图形对象是线段，主要的定义的是Geom对象的draw_group方法，返回值是使用grid::segmentsGrob函数生成的segmentsGrob绘图对象，绘制的线段的位置是由StatBoxplot变换而来的最大值和最大值决定的。在进行绘制前一定要使用coord的transform函数处理一下变换而来的data。

Geom的setup_data方法可以在绘制之前先处理一下数据，因为最大值和最小值只定义了线段的y坐标，没有定义线段的x坐标，所以x坐标可以根据width参数值在boxplot的x坐标两边均分，变为xmin和xmax用于segmentsGrob的绘制。

GeomWhisker <- ggproto(
  "GeomWhisker",
  Geom,

  extra_params = c("na.rm", "width", "orientation"),

  setup_params = function(data, params) {
    params$flipped_aes <- has_flipped_aes(data, params)
    params$whisker_width <- params$whisker_width/length(unique(data$group))

    params
  },
  setup_data = function(data, params){
    data$flipped_aes <- params$flipped_aes

    data$xmin <- data$x - data$width / 2
    data$xmax <- data$x + data$width / 2

    flip_data(data, params$flipped_aes)
  },
  draw_group = function(data, panel_params, coord, whisker_width = 0.2,  flipped_aes = FALSE) {
    data <- flip_data(data, flipped_aes)
    if (nrow(data) != 1) {
      abort("Can't draw more than one boxplot per group. Did you forget aes(group = ...)?")
    }
    coords <-
      coord$transform(data, panel_params)
    common <- list(
      colour = data$colour,
      size = data$size,
      linetype = data$linetype,
      fill = alpha(data$fill, data$alpha),
      group = data$group
    )

    new_data_frame <- function (x = list(), n = NULL) {
      if (length(x) != 0 && is.null(names(x))) {
        abort("Elements must be named")
      }
      lengths <- vapply(x, length, integer(1))
      if (is.null(n)) {
        n <- if (length(x) == 0 || min(lengths) == 0) 0 else max(lengths)
      }
      for (i in seq_along(x)) {
        if (lengths[i] == n) next
        if (lengths[i] != 1) {
          abort("Elements must equal the number of rows or 1")
        }
        x[[i]] <- rep(x[[i]], n)
      }
      class(x) <- "data.frame"
      attr(x, "row.names") <- .set_row_names(n)
      x
    }

    whiskers <- new_data_frame(c(list(
      x = c(coords$x - whisker_width, coords$x - whisker_width),
      xend =  c(coords$x + whisker_width, coords$x + whisker_width),
      y = c(coords$ymin, coords$ymax),
      yend =  c(coords$ymin, coords$ymax),
      alpha = c(NA_real_, NA_real_)
    ),
    common), n = 2)
    whiskers <-
      flip_data(whiskers, flipped_aes)

    # print(whiskers)
    seg_grob <-
      grid::segmentsGrob(
        x0 = whiskers$x,
        y0 = whiskers$y,
        x1 = whiskers$xend,
        y1 = whiskers$yend,
        default.units = "npc",
        gp = grid::gpar(
          col = whiskers$colour,
          fill = alpha(whiskers$fill, whiskers$alpha),
          lwd = whiskers$size,
          lty = whiskers$linetype
        )
      )
    ggplot2:::ggname("geom_whisker", grid::grobTree(seg_grob))
  },

  default_aes = aes(
    colour = "grey20",
    fill = "white",
    size = 0.5,
    alpha = NA,
    shape = 19,
    linetype = "solid"
  ),
  draw_key = draw_key_path,
  required_aes = c(
  )
)

使用起来就比较简单了，它还可以支持自定义线宽、线性、左右宽度、颜色等参数。

p <-
  iris %>%
  mutate(group = as.character(sample(1:3, 150, replace = TRUE))) %>%
  ggplot(aes(x = Species, y = Sepal.Width, fill = Species, color = group)) +
  geom_boxplot() +
  geom_whisker(
    aes(color = group),
    whisker_width = 0.3, 
    size = 5, 
    linetype = 1, 
    show.legend = F
  )
p