6 Summary Tables

6.1 Publicaton ready summary tables

6.1.1 Rationale

In routine, it is very time consuming, frustrating, and error prone to write again the results/ outputs obtained from statistical software into the writing and communication documents, be it an article/ manuscript or a dissertation or a thesis.

Further, Most courses and tutorials on Data Analytics using R teach a bunch of R functions but do not lead us to the outcome, which is to produce analyzed tables.

In R, there are certain packages which enable you to create publication ready tables which can be incorporated into research dissemination documents directly or with minor modifications. This saves a lot of mundane and unnecessary work and provides more time for interpretation and domain expertise related work.

We shall be using the gtsummary package which is compatible with tidy principles of working and creates presentation-ready tables, regression models, and more. The code to create the tables is concise and highly customizable.

6.2 Introduction to publication ready tables!!

The tbl_summary() function calculates descriptive statistics for continuous, categorical, and dichotomous variables in R, and presents the results in a beautiful, customizable summary table ready for publication. To introduce tbl_summary() we will show the most basic behaviour first, which actually produces a large and beautiful table. Then, we will examine in detail how to make adjustments and more tailored tables.The default behavior of tbl_summary() is quite incredible - it takes the columns you provide and creates a summary table in one command. The function prints statistics appropriate to the column class: median and inter-quartile range (IQR) for numeric columns, and counts (%) for categorical columns. Missing values are converted to ‘Unknown’.

Illustrative example: A researcher is interested to know the basic descriptive analysis of the first five variables in low birth weight data.

df |>  
  select(low, bwt) |>
  tbl_summary() |>  
  as_hux_table()

Characteristic	N = 189
low
Normal	130 (69%)
Low Birth Weight	59 (31%)
bwt	2,977 (2,414, 3,487)
n (%); Median (Q1, Q3)

Note

The sensible defaults with this basic usage: each of the defaults may be customized. Variable types are automatically detected so that appropriate descriptive statistics are calculated. Label attributes from the data set are automatically printed. Missing values are listed as “Unknown” in the table. Variable levels are indented and footnotes are added.

6.3 Adjustments

6.3.1 Stratified tables.

You can stratify your table by a column (e.g. by outcome), creating a 2-way table by using by = argument in the tbl_summary() function.

df |>  
  select(smoke, low) |> 
  tbl_summary(by = low) |>  as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
smoke
Non Smoker	86 (66%)	29 (49%)
Smoker	44 (34%)	30 (51%)
n (%)

6.3.2 Customizing output of selected variables.

Use an equations to specify which statistics to show and how to display them. There are two sides to the equation, separated by a tilde ~. On the right side, in quotes, is the statistical display desired, and on the left are the columns to which that display will apply.

df |> 
  select(bwt, low) |>
  tbl_summary(
    by = low,
    statistic = bwt~"{mean}"
  ) |> 
  as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
bwt	3,329	2,097
Mean

df |> 
  select(bwt, low) |>
  tbl_summary(
    by = low,
    statistic = bwt~"{mean}, {sd}"
  ) |> 
  as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
bwt	3,329, 478	2,097, 391
Mean, SD

6.3.3 Changing label of a single variable.

Adjust how the column name should be displayed. Provide the column name and its desired label separated by a tilde. The default is the column name. This is done with help of argument label = in tbl_summary function.

df |> 
  select(bwt, low) |>
  tbl_summary(
    by = low,
    statistic = bwt~"{mean}, {sd}",
    label = bwt ~ "Birth Weight"
  ) |> as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
Birth Weight	3,329, 478	2,097, 391
Mean, SD

6.3.4 Changing labels of multiple variables.

You can change labels of multiple variables by providing the labels as a list to the label argument.

df |> 
  select(bwt, low, smoke) |>
  tbl_summary(
    by = low,
    statistic = bwt~"{mean}, {sd}",
    label = list(bwt ~ "Birth Weight",
                 smoke ~ "Smoking history")) |> as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
Birth Weight	3,329, 478	2,097, 391
Smoking history
Non Smoker	86 (66%)	29 (49%)
Smoker	44 (34%)	30 (51%)
Mean, SD; n (%)

Can we provide a list to the statistic argument also for customizing statistical output? Try it!

6.3.5 Multiline output for a single variable.

If you want to print multiple lines of statistics for variables, you can indicate this by setting the type = to “continuous2”. You can combine all of the previously shown elements in one table by choosing which statistics you want to show. To do this you need to tell the function that you want to get a table back by entering the type as continuous2.

df |> 
  select(bwt, low, smoke) |>
  tbl_summary(
    by = low,
    type = bwt ~ "continuous2",
    statistic = bwt~c(
      "{mean}, {sd}",
      "{median}, ({p25}, {p75})"),
    label = list(bwt ~ "Birth Weight",
                 smoke ~ "Smoking history")) |> as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
Birth Weight
Mean, SD	3,329, 478	2,097, 391
Median, (Q1, Q3)	3,267, (2,948, 3,651)	2,211, (1,928, 2,410)
Smoking history
Non Smoker	86 (66%)	29 (49%)
Smoker	44 (34%)	30 (51%)
n (%)

6.3.6 Multiline output for all continuous variables.

If you wish to print multiline output for all continuous variables, instead of providing “continuous2” argument specified by name of the variable, use continous() in type and statisticarguments.

df |> 
  select(bwt, low, smoke, lwt) |>
  tbl_summary(
    by = low,
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous()~c(
      "{mean}, {sd}",
      "{median}, ({p25}, {p75})"),
    label = list(bwt ~ "Birth Weight",
                 smoke ~ "Smoking history")) |> as_hux_table()

Characteristic	Normal N = 130	Low Birth Weight N = 59
Birth Weight
Mean, SD	3,329, 478	2,097, 391
Median, (Q1, Q3)	3,267, (2,948, 3,651)	2,211, (1,928, 2,410)
Smoking history
Non Smoker	86 (66%)	29 (49%)
Smoker	44 (34%)	30 (51%)
lwt
Mean, SD	133, 32	122, 27
Median, (Q1, Q3)	124, (113, 147)	120, (103, 130)
n (%)

6.3.7 Multiline output for categorical variables.

The type argument in tbl_summary function is an optional argument which includes details for the customized outputs according to the type of variables.

df |> 
  select(bwt, low, smoke) |> 
  tbl_summary(type = all_continuous() ~ "continuous2", 
              statistic = list(all_continuous() ~ c(
                "{mean} ({sd})", 
                "{median} ({p25}, {p75})"), 
      all_categorical() ~ "{n} ({p}%)"),   
      digits = all_continuous() ~ 1) |> # setting for decimal points
  as_hux_table()

Characteristic	N = 189
bwt
Mean (SD)	2,944.6 (729.2)
Median (Q1, Q3)	2,977.0 (2,414.0, 3,487.0)
low
Normal	130 (69%)
Low Birth Weight	59 (31%)
smoke
Non Smoker	115 (61%)
Smoker	74 (39%)
n (%)

Tip

Having a reproducible code works wonders! Try writing this reproducible code for your own tidy dataset. Voila! You will have publication ready tables.

6.4 Inferential statistics and publication ready tables.

Compare the difference in means for a continuous variable in two groups. add_p()function from gtsummarypackage adds p-values to gtsummary table

Illustrative example: t test

df |>  
  select(bwt, smoke) |>  
  tbl_summary(by = smoke) |>  
  add_p(bwt ~ "t.test") |>  
  as_hux_table()

Characteristic	Non Smoker N = 115	Smoker N = 74	p-value
bwt	3,100 (2,495, 3,629)	2,776 (2,367, 3,260)	0.007
Median (Q1, Q3)
Welch Two Sample t-test

What happens if we do not pass any argument to function? Try it!

6.4.1 Statistical tests/ methods available in `add_p()` function.

To find the list of tests available internally within gtsummary, type ?gtsummary::tests in your console. What do you see? There are tbl_summary() variants as well as add_difference variant. Refer to gtsummary vignettes available at https://cran.r-project.org/web/packages/gtsummary/index.html for more details.

6.4.2 Automated Inferential statistics with Publication ready tables

Illustrative example. A researcher is interested to know whether there is a significant difference in mean birth weight as well as proportion of low birth weight babies among mothers with history of smoking during pregnancy as compared to those without history of smoking during pregnancy.

To answer the question for this study, the summary statistics should be grouped by smoking history group, which can be done by using the by= argument. To compare two or more groups, include add_p() with the function, which detects variable type and uses an appropriate statistical test.

df |> 
  select(smoke, bwt, low) |>  
  tbl_summary(by = smoke) |>  
  add_p() |>  
  as_hux_table()

Characteristic	Non Smoker N = 115	Smoker N = 74	p-value
bwt	3,100 (2,495, 3,629)	2,776 (2,367, 3,260)	0.007
low			0.026
Normal	86 (75%)	44 (59%)
Low Birth Weight	29 (25%)	30 (41%)
Median (Q1, Q3); n (%)
Wilcoxon rank sum test; Pearson's Chi-squared test

Way forward

We have introduced you to one of the most powerful package currently used to develop publication ready tables. It might seem that there is a lot to learn for making publication ready tables using R. Initially, one might feel unnecessary to learn these syntax. However, if you seriously are into creating tables in your professional career, it is recommended and worth investing time to learn these syntax. We are confident your initial effort will save a lot of time subsequently and make you more efficient and accurate in future. Best wishes!