The survsup package provides straightforward R functions for plotting Kaplan-Meier curves with the inclusion of numbers at risk tables. Although other methods exist, they either do not fit my workflow well or produce quite ugly plots.

This package is based upon ggplot2, which comes in handy in case you'd wish to tweak anything in the plot, since a ggplot object is returned.

Furthermore, this package is very easy to use using the pipe operator, see examples below.

We will here give some short introduction on how to use this package.

Note that this package is under active development; additional features may be added. Check at the project's GitHub site for the latest version (http://github.com/dlindholm/survsup/).

Basic usage

Let's start with the most basic thing, plotting a plain old Kaplan-Meier curve:

library(survsup)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(ggplot2)
library(survival)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union


fit <- survfit(Surv(time, status) ~ 1, data = lung)
plot_survfit(fit)

plot of chunk unnamed-chunk-1

As you will see later, it is even more convenient to use the pipe operator (%>%):

lung %>%
  survfit(Surv(time, status) ~ 1, data = .) %>%
  plot_survfit()

plot of chunk unnamed-chunk-2

See that the plot by default plots the cumulative incidence? (this is the most common practice in my area of work, hence it being the default) Let's try to plot survival instead:

lung %>%
  survfit(Surv(time, status) ~ 1, data = .) %>%
  plot_survfit(cuminc = FALSE)

plot of chunk unnamed-chunk-3

Now let's see if the survival differs between men and women:

lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE)

plot of chunk unnamed-chunk-4

Well, maybe, but let's add the 95% confidence intervals!

lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE, ci = TRUE)

plot of chunk unnamed-chunk-5 Now let's change the axis labels:

lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE, ci = TRUE) + # <--- NOTE!
  labs(x = "Time (days)", y = "Survival (%)")

plot of chunk unnamed-chunk-6

What happened here? Well, as the plotting function actually returns a ggplot object, then it's easy to just add some ggplot stuff to the plot, in this case using the function labs to add some labels. Note the plus sign! (this is common practice in ggplot2)

Numbers at risk table

If we wish to add a number at risk table, we just put the function nar into the pipeline:

lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE) %>%
  nar()

plot of chunk unnamed-chunk-7

Perhaps we want to change the font size in the numbers at risk table? Then we just provide the size argument:

lung %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit(cuminc = FALSE) %>%
  nar(size = 3)

plot of chunk unnamed-chunk-8

Changing plot appearance

Colors

Since we're dealing with ggplot objects it is very straightforward to change the color scale using standard ggplot2 syntax. In this case we're looking at another example (namely effects of different treatment strategies for colon cancer):

colon %>%
  survfit(Surv(time, status) ~ rx, data = .) %>%
  plot_survfit() %>%
  nar() +
  scale_color_manual(values = c("darkorange", "steelblue", "darkred"))

plot of chunk unnamed-chunk-9

We've also included a few convenience functions to get commonly used presets of color schemes. For some reason, the color guides used in skiing slopes are popular also in Kaplan-Meier curves (i.e. green, blue, red, black):

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  skislopes()

plot of chunk unnamed-chunk-10

You could, if you wanted to, reverse the order of the colors thus:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  skislopes(reverse = TRUE)

plot of chunk unnamed-chunk-11

Another common color scheme is similar to the skislopes but with orange/yellow instead of black, called cat4 in this package:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  cat4()

plot of chunk unnamed-chunk-12

However, you'd probably don't want any color stick out too much. The colorspace package (https://CRAN.r-project.org/package=colorspace) by Zeileis, Hornik, and Murrell provides a nice qualitative color palettes based on the HCL colors. We provide a convenience function, that provides colors for the number of strata based on the rainbow_hcl function. You have the options to tweak parameters such as chroma and luminance to taste. The convenience function is called hcl_rainbow:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  hcl_rainbow()

plot of chunk unnamed-chunk-13

Like in the previous convenience functions, you could also easily reverse the order of the colors:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit() %>%
  nar() %>%
  hcl_rainbow(reverse = TRUE)

plot of chunk unnamed-chunk-14

Line width

You could also tweak the width of the lines easily using the lwd argument:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(lwd = 2)

plot of chunk unnamed-chunk-15

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(lwd = 0.5)

plot of chunk unnamed-chunk-16

Legend title

If you want to change the legend title to something more informative, use the legend.title argument:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(legend.title = "Extent of disease")

plot of chunk unnamed-chunk-17

Axes and breaks

By default, axis and axis breaks are determined by ggplot2 to represent the data optimally. However, you may have other demands, such that for instance locking the Y axis so that multiple plots will have the same range to allow for comparison, or you might be interested only in a shorter period of time than the full period under study. Furthermore, you might wish more or fewer timepoints at which the number at risk figures are shown. In the following passage, we will go through these tweaks.

Changing Y axis limits

Say that you want your Y scale to go all the way from 0 to 100. This is easily achieved by the ylim argument:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(ylim = c(0, 100)) %>%
  nar()

plot of chunk unnamed-chunk-18

Setting X axis upper limit and breaks

In this example, say that you're interested only up to the timepoint 2000:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(xmax = 2000) %>%
  nar()

plot of chunk unnamed-chunk-19

Notice how the number at risk table was expanded to include numbers at risk at the new X axis breaks selected by ggplot. Perhaps you only want numbers at risk at, say, 0, 1000, 2000. Then we can use the xbreaks argument:

colon %>%
  survfit(Surv(time, status) ~ extent, data = .) %>%
  plot_survfit(xmax = 2000, xbreaks = c(0, 1000, 2000)) %>%
  nar()

plot of chunk unnamed-chunk-20

Formatting the numbers at risk table

Flipping row order

Consider this figure:

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar()

plot of chunk unnamed-chunk-21

Here you see that the red line is (ever so slightly) above the green line, whereas the green row in the numbers at risk table is above the red row. If you want to fix this, you can use the flip argument as follows:

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(flip = TRUE)

plot of chunk unnamed-chunk-22

Voila!

Setting font size

Use the size argument to change font size:

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(size = 5, flip = TRUE)

plot of chunk unnamed-chunk-23

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(size = 2, flip = TRUE)

plot of chunk unnamed-chunk-24

Alter row spacing

The argument y_offset determines how far below the main plot the numbers at risk table will be. By default, this takes the value 0.05; but could be changed to taste:

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(y_offset = 0.1, flip = TRUE)

plot of chunk unnamed-chunk-25

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(y_offset = 0.03, flip = TRUE)

plot of chunk unnamed-chunk-26

Separator line

By default, a thin horizontal line separates the main plotting area from the numbers at risk table. This could be turned off:

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(separator = FALSE)

plot of chunk unnamed-chunk-27

You could also change some characteristics of the line, such as color and line width:

colon %>%
  survfit(Surv(time, status) ~ sex, data = .) %>%
  plot_survfit() %>%
  nar(sep_color = "grey90", sep_lwd = 1.5)

plot of chunk unnamed-chunk-28

Combining multiple plots

Perhaps you'd want to combine multiple plots. Enter gridExtra (https://CRAN.r-project.org/package=gridExtra)!

library(gridExtra)
#> 
#> Attaching package: 'gridExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     combine

Let's combine two plots:

p <- list(
  p1 = colon %>%
    survfit(Surv(time, status) ~ sex, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar() +
    labs(tag = "A"),

  p2 = colon %>%
    survfit(Surv(time, status) ~ node4, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar() +
    labs(tag = "B")
)

grid.arrange(grobs = p, ncol = 2)

plot of chunk unnamed-chunk-30

gridExtra provides ways to create even more complex plots:

# Store plots in a list
p <- list(
  p1 = colon %>%
    survfit(Surv(time, status) ~ 1, data = .) %>%
    plot_survfit(ylim = c(0, 100)) +
    labs(tag = "A"),

  p2 = colon %>%
    survfit(Surv(time, status) ~ rx, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "B"),

  p3 = colon %>%
    survfit(Surv(time, status) ~ extent, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "C"),

  p4 = colon %>%
    survfit(Surv(time, status) ~ sex, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "D"),

  p5 = colon %>%
    survfit(Surv(time, status) ~ node4, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "E"),

  p6 = colon %>%
    survfit(Surv(time, status) ~ surg, data = .) %>%
    plot_survfit(ylim = c(0, 100)) %>%
    nar(2, separator = FALSE) +
    labs(tag = "F")

)

#Define layout matrix
lay <- rbind(c(1,1,2),
             c(1,1,3),
             c(4,5,6))

#Plot it all!
grid.arrange(grobs = p, layout_matrix = lay)