首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >龙卷风/R中的双侧水平条形图,图表轴线在给定值处交叉(而不是在零处交叉)

龙卷风/R中的双侧水平条形图,图表轴线在给定值处交叉(而不是在零处交叉)
EN

Stack Overflow用户
提问于 2019-04-19 02:36:04
回答 1查看 4.3K关注 0票数 3

我想在R中绘制一个龙卷风图(双面水平条形图),用于确定性灵敏度分析,我已经尝试了几个代码,但没有得到所需的输出。

我希望达到:-

  1. 绘图应该按照敏感参数的降序排列(即最宽的区间应该表示在图表的顶部-为了获得敏感性,我们首先计算下界值和上界值的绝对差,我在我的数据框代码中将其命名为"UL_Difference“)。
  2. 中心不应该在零,而应该在给定值,我们称之为“基本情况”或我的结果表的核心/最终结果(在这个表上,我们想要检查使用参数的下界和上限值改变固定参数的影响,并为下界和上界值生成核心结果)。Excel中的示例代码是
  3. 该图应具有标题“药物A与药物P的龙卷风图”。

我试过很多代码。下面是一个例子,它给了我一个龙卷风图,但并不完全是我想要从R生成的。

代码语言:javascript
复制
Base_Result <- results.table[5,4] # Base/Core result (which I have not used in my codes below yet)

Drug_AP <- seq(1, 48, 4)
D_AP <- data.frame(OWSA[Drug_AP,]) # OWSA[] is a 10x3 matrix with 'Lower_Bound', 'Upper_Bound' and Absolute Difference of the LB and UB termed as 'UL_Difference' (row names are parameters)
DSA_Drug_AP <- D_AP[order(D_AP$UL_Difference, decreasing = T),] # Ordering the data.frame above in Descending order of 'UL_Difference'
cat("DSA Table: Drug A vs P \n")
library(formattable)
print(accounting(as.matrix(DSA_Drug_AP), digits = 0, format = "f", big.mark = ","), right = T) # Just printing the above data.frame

我尝试了下面的代码来绘制龙卷风:

(我不确定是否应该制作下面的数据框,也许这是我没有得到所需输出的原因之一)

代码语言:javascript
复制
dat <- data.frame(Group = c(rep("Lower_Bound", 12), rep("Upper_Bound", 12)), 
                  Parameters = rep(rownames(DSA_Drug_AP), 2), 
                  UL = c(-DSA_Drug_AP[,1], DSA_Drug_AP[,2]))

(最后,我使用"ggplot“绘制了上面的数据框,如下所示)

代码语言:javascript
复制
library(ggplot2)
ggplot(dat, aes(x = Parameters, y = UL, fill = Group)) + 
    coord_flip() + 
    geom_bar(stat = "identity", position = "identity", width = 0.525) +
    theme(legend.position="top", axis.text.x = element_text(angle = 0, hjust = 0.5, vjust = 0.5, size = 10))

并如下所示获取输出:

下面是我想要实现的输出(第1点和第2点已实现;图表由excel生成)。

代码语言:javascript
复制
# Also, the data I'm using is shown below: -

Base_Result <- 9,504  # Value of results.table[5,4] on which I get 'lower' and 'upper' limit values below (and want tornado with the origin at this base_result).

# My data.frame "D_AP" will look like (I just renamed my parameters to 1(to)12)

           Lower_Bound  Upper_Bound UL_Difference
Parameter_01     8,074      11,181   3,108 
Parameter_02     8,177      11,007   2,831 
Parameter_03     8,879      10,188   1,308 
Parameter_04     4,358      18,697   14,339 
Parameter_05     9,073      10,087   1,013 
Parameter_06     12,034      7,572   4,462 
Parameter_07     11,357      7,933   3,423 
Parameter_08     9,769       9,202   567 
Parameter_09     8,833      10,403   1,570 
Parameter_10     13,450      4,219   9,231 
Parameter_11     10,691      7,915   2,776 
Parameter_12     10,036      8,792   1,244 

# Once, I did sort in descending order then it will be data.frame "DSA_Drug_AP" as below: -

            Lower_Bound Upper_Bound UL_Difference
Parameter_04     4,358      18,697   14,339 
Parameter_10     13,450      4,219   9,231 
Parameter_06     12,034      7,572   4,462 
Parameter_07     11,357      7,933   3,423 
Parameter_01     8,074      11,181   3,108 
Parameter_02     8,177      11,007   2,831 
Parameter_11     10,691      7,915   2,776 
Parameter_09     8,833      10,403   1,570 
Parameter_03     8,879      10,188   1,308 
Parameter_12     10,036      8,792   1,244 
Parameter_05     9,073      10,087   1,013 
Parameter_08     9,769       9,202   567 

# Please note that I need to plot the 1st and 2nd column of values 
# (shown in above table in order of 3rd column as a tornado plot).
# The parameter-## names will come to the left vertical line of plot.

提前谢谢你!

EN

回答 1

Stack Overflow用户

发布于 2019-05-20 04:58:04

下面是一个使用ggplot2::geom_col()生成龙卷风图的函数,以及一些使用示例。希望这能帮上忙。

代码语言:javascript
复制
# Tornado Plot using ggplot2(), 2019/05/19.
# See Wikipedia: ["Tornado diagram"](https://en.wikipedia.org/wiki/Tornado_diagram).

library( magrittr )
library( tidyverse )

# Function tornado_plot() produces a "tornado plot" given the sensitivity
# analysis results in data_frame df. It plots green bars indicating the levels
# of the response variable when each **x input variable** is moved to its maximum
# level while holding all other variables constant.  Similarly, the red bars are
# the outputs when each **x input variable** is moved to its minimum value while
# holding all other variables constant. The input variable to which the output
# is most sensitive is shown at the top of the plot. And the bars are stacked
# from most sensitive to least sensitive, fancifully yielding the shape of a
# tornado.
tornado_plot <-
  function(
    df,
    var_names_col,
    min_level_col,
    min_output_col,
    max_level_col,
    max_output_col,
    base_level_col,
    baseline_output,
    title_str      = "Tornado Plot",
    subtitle_str   = "",
    caption_str    = "",
    ylab_str       = "output",
    baseline_label = "",
    scale_breaks   = NULL,
    limits         = NULL
  ) {
    # + The argument df must be a tidyverse::tibble with columns referred to by all of the
    #   other arguments having "col" in their names.
    # + The var_names_col argument must be an unquoted column name that contains characters
    #    naming the variables that were varied in the sensitivity analysis.
    # + The level column arguments -- min_level_col, max_level_col and
    #   base_level_col -- must be unquoted column names that contain characters to be
    #   used in forming labels for each variable bar of the plot.
    # + The output column arguments -- min_output_col and max_output_col -- must
    #   be unquoted column names that contain numerical values to be plotted as the
    #   extents of the bars in the plot.
    # + The baseline_output argument is the numeric value of the output (response) variable
    #   produced by setting all of the variables to their base levels.

    var_names_col  <- enquo( var_names_col )
    min_level_col  <- enquo( min_level_col )
    max_level_col  <- enquo( max_level_col )
    base_level_col <- enquo( base_level_col )
    min_output_col <- enquo( min_output_col )
    max_output_col <- enquo( max_output_col )

    have_custom_y_breaks <- !any( is.null(scale_breaks) )

    # Create a generic tibble as the data source for plotting.
    # Sorts variables from the one to which the output was least sensitive
    # to the one to which the output was most sensitive.
    # Then creates labels for each variable capturing the min, base, and max
    # levels of that variable.
    # Finally, it centers all outputs around the baseline output so thta the
    # ggplot2::geom_col() function can still work with zero-based bars.
    plt_df <- df %>% 
      mutate(del = abs(!!max_output_col - !!min_output_col) ) %>% 
      arrange(del) %>% 
      mutate(
        names = sprintf(
          "%s\n(min=%s; base=%s; max=%s)",
          !!var_names_col,
          !!min_level_col,
          !!base_level_col,
          !!max_level_col
        ),
        names = factor(names,names),
        min   = !!min_output_col,
        max   = !!max_output_col
      ) %>% 
      dplyr::select(names,min,max) %>% 
      gather( key = Level, value = output, -names) %>% 
      mutate( output = output - baseline_output, Level = factor(Level,c("min","max")) ) #%T>% print()

    # Generate the tornado plot.
    plt <- plt_df %>% 
      {
        ggplot(., aes( fill = Level, x = names, y = output )) + 
          geom_hline(yintercept = 0, linetype = 1, size = 2, color = "darkgray") +
          geom_col( alpha = 0.4, width = 0.98) + 
          coord_flip() + #*** NOTE THE COORDINATE FLIP ***
          geom_text(aes(y = 0, label = names), size = 4, fontface = "bold" ) +
          scale_x_discrete( expand = expand_scale(add = 1 ) ) +
          scale_fill_manual(values = c(min = "red", max = "green") ) +
          ylab( ylab_str ) +
          theme( # **Hmmm, references the ACTUAL plotted (post-flipped) x-y axes. **
            axis.ticks.y = element_blank(),
            axis.text.y  = element_blank(),
            axis.title.y = element_blank(),
            panel.grid.major.y = element_blank(), # Remove horizontal grid lines
            panel.grid.minor.y = element_blank(),
            axis.text.x  = element_text( size = 14 ),
            axis.title.x = element_text( size = 16 ),
            title        = element_text( size = 18 ),
            legend.position = "bottom"
          ) +
          labs( title = title_str, subtitle = subtitle_str, caption = caption_str )
      }
    # Set the pre-flipped y-axis (which gets flipped to be the x-axis in the final plot).
    if( !is.null(limits) ){
      y_limits = limits
    } else {
      y_limits = c(-max(abs(plt_df$output)),max(abs(plt_df$output)))
    }
    if( have_custom_y_breaks ){
      plt <- plt + scale_y_continuous(
        limits = y_limits,
        breaks = scale_breaks,
        labels = names(scale_breaks)
      )
    } else {
      plt <- plt + scale_y_continuous(
        limits = y_limits,
        labels = function(x) baseline_output + x
      )
    }
    # Add the baseline output label, if any
    if(baseline_label != ""){
      return(
        plt + 
          geom_label(
            data = tibble( x = 0.25, y = 0, label = baseline_label),
            mapping = aes( x = x, y = y, label = label),
            fontface = "bold",
            show.legend = FALSE,
            inherit.aes = FALSE
          )
      )
    } else {
      return( plt )
    }
  }

#--------------------------------------------------------------------------------     

# USAGE EXAMPLE:
# Hypothetical Investment Strategy Analysis:
# These are data from a sensitivity analysis on an investment strategy that invests in an
# an S&P 500 index fund and a "safety" value-store (a 0%-real-return investment); 
# protecting winnings from market with transfer to safety when strategy criteria are met. 
# Disregards taxes and fees. Real values (i.e., inflation-adjusted).
sensitivity_df <- tribble(
  ~variable,                            ~min,  ~base,   ~max, ~Total_at_min, ~Total_base, ~Total_at_max,             ~Time_period,
  "Start Value",                           0,   2000, 100000,        239600,      245900,        554800, "start: 1980, end: 2005",
  "Monthly Investment",                    0,    500,   1000,          6300,      245900,        485600, "start: 1980, end: 2005",
  "Allocation to Safety",                  0,    0.3,    0.5,        277800,      245900,        224700, "start: 1980, end: 2005",
  "Annual Increase in Mo. Investment",     0,   0.01,   0.03,        222700,      245900,        303800, "start: 1980, end: 2005",
  "Protection Rate",                       0, 0.0025,   0.03,        310300,      245900,        199500, "start: 1980, end: 2005",

  "Start Value",                           0,   2000, 100000,        174300,      175900,        253300, "start: 1910, end: 1935",
  "Monthly Investment",                    0,    500,   1000,          1600,      175900,        350100, "start: 1910, end: 1935",
  "Allocation to Safety",                  0,    0.3,    0.5,        177700,      175900,        174600, "start: 1910, end: 1935",
  "Annual Increase in Mo. Investment",     0,   0.01,   0.03,        155600,      175900,        227100, "start: 1910, end: 1935",
  "Protection Rate",                       0, 0.0025,   0.03,        171800,      175900,        176000, "start: 1910, end: 1935"
) %>%  # Add x-input level labels (overwriting reals min, base, max with character values through mutate_at()).
  mutate_at(vars(contains("Total")), ~{100*round(./100)}) %>%
  mutate_at(
    vars( min, base, max), 
    ~ { 
      ifelse(
        abs(.) >= 1000,
        paste0("$",formatC(.,big.mark = ",",format = "f",digits = 0)),
        sprintf(
          c( "$%.0f", "$%.0f", "%.0f%%", "%.1f%%", "%.2f%%" ), 
          . * c(1,1,100,100,100)
        )
      )
    } 
  )

# Generate the tornado plot with generic labeling and axis.
sensitivity_df %>%
  filter( grepl("1980.+2005", Time_period ) ) %>%
  tornado_plot(
    var_names_col   = variable,
    min_level_col   = min,
    min_output_col  = Total_at_min,
    max_level_col   = max,
    max_output_col  = Total_at_max,
    base_level_col  = base,
    baseline_output = .$Total_base[[1]]
  ) %>% print()


# Generate the tornado plot with customized labeling and axis.
scl_limits = c(0, 6.0e5 )
sensitivity_df %>%
  filter( grepl("1980.+2005", Time_period ) ) %>%
  tornado_plot(
    var_names_col   = variable,
    min_level_col   = min,
    min_output_col  = Total_at_min,
    max_level_col   = max,
    max_output_col  = Total_at_max,
    base_level_col  = base,
    baseline_output = .$Total_base[[1]],
    title_str       = "Sensitivity of Total Value to Strategy Variables",
    subtitle_str    = sprintf( "Time period %s", .$Time_period[[1]] ),
    caption_str     = "Assuming S&P 500 index & 0%-real-return 'safe harbor'",
    ylab_str        = "Total Value",
    baseline_label  = paste0("Base Case:\n$",format(100*round(.$Total_base[[1]]/100,0),big.mark = ",")),
    scale_breaks    = setNames(
      seq(min(scl_limits), max(scl_limits), 1e5) - .$Total_base[[1]], 
      paste0("$",formatC(
        seq(min(scl_limits), max(scl_limits), 1e5),big.mark = ",",format = "f",digits = 0)
      )
    ),
    limits          = scl_limits - .$Total_base[[1]]
  ) %>% print()

# Generate the tornado plot for another time period, with scaling
# to be comparable with the first time period.
sensitivity_df %>%
  filter( grepl("1910.+1935", Time_period ) ) %>%
  tornado_plot(
    var_names_col   = variable,
    min_level_col   = min,
    min_output_col  = Total_at_min,
    max_level_col   = max,
    max_output_col  = Total_at_max,
    base_level_col  = base,
    baseline_output = .$Total_base[[1]],
    title_str       = "Sensitivity of Total Value to Strategy Variables",
    subtitle_str    = sprintf( "Time period %s", .$Time_period[[1]] ),
    caption_str     = "Assuming S&P 500 index & 0%-real-return 'safe harbor'",
    ylab_str        = "Total Value",
    baseline_label  = paste0("Base Case:\n$",format(100*round(.$Total_base[[1]]/100,0),big.mark = ",")),
    scale_breaks    = setNames(
      seq(min(scl_limits), max(scl_limits), 1e5) - .$Total_base[[1]], 
      paste0("$",formatC(
        seq(min(scl_limits), max(scl_limits), 1e5),big.mark = ",",format = "f",digits = 0)
      )
    ),
    limits          = scl_limits - .$Total_base[[1]]
  ) %>% print()

Generic tornado plot

Tornado plot with custom labels, axis

Tornado plot w/custom labels and on same scale as previous one

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55751978

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档