df |>
select(low, bwt) |>
tbl_summary() |>
as_hux_table()Characteristic | N = 189 |
|---|---|
| low | |
| Normal | 130 (69%) |
| Low Birth Weight | 59 (31%) |
| bwt | 2,977 (2,414, 3,487) |
| n (%); Median (Q1, Q3) | |
In routine, it is very time consuming, frustrating, and error prone to write again the results/ outputs obtained from statistical software into the writing and communication documents, be it an article/ manuscript or a dissertation or a thesis.
Further, Most courses and tutorials on Data Analytics using R teach a bunch of R functions but do not lead us to the outcome, which is to produce analyzed tables.
In R, there are certain packages which enable you to create publication ready tables which can be incorporated into research dissemination documents directly or with minor modifications. This saves a lot of mundane and unnecessary work and provides more time for interpretation and domain expertise related work.
We shall be using the gtsummary package which is compatible with tidy principles of working and creates presentation-ready tables, regression models, and more. The code to create the tables is concise and highly customizable.

The tbl_summary() function calculates descriptive statistics for continuous, categorical, and dichotomous variables in R, and presents the results in a beautiful, customizable summary table ready for publication. To introduce tbl_summary() we will show the most basic behaviour first, which actually produces a large and beautiful table. Then, we will examine in detail how to make adjustments and more tailored tables.The default behavior of tbl_summary() is quite incredible - it takes the columns you provide and creates a summary table in one command. The function prints statistics appropriate to the column class: median and inter-quartile range (IQR) for numeric columns, and counts (%) for categorical columns. Missing values are converted to ‘Unknown’.
Illustrative example: A researcher is interested to know the basic descriptive analysis of the first five variables in low birth weight data.
df |>
select(low, bwt) |>
tbl_summary() |>
as_hux_table()Characteristic | N = 189 |
|---|---|
| low | |
| Normal | 130 (69%) |
| Low Birth Weight | 59 (31%) |
| bwt | 2,977 (2,414, 3,487) |
| n (%); Median (Q1, Q3) | |
Note
The sensible defaults with this basic usage: each of the defaults may be customized. Variable types are automatically detected so that appropriate descriptive statistics are calculated. Label attributes from the data set are automatically printed. Missing values are listed as “Unknown” in the table. Variable levels are indented and footnotes are added.
You can stratify your table by a column (e.g. by outcome), creating a 2-way table by using by = argument in the tbl_summary() function.
df |>
select(smoke, low) |>
tbl_summary(by = low) |> as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| smoke | ||
| Non Smoker | 86 (66%) | 29 (49%) |
| Smoker | 44 (34%) | 30 (51%) |
| n (%) | ||
Use an equations to specify which statistics to show and how to display them. There are two sides to the equation, separated by a tilde ~. On the right side, in quotes, is the statistical display desired, and on the left are the columns to which that display will apply.
df |>
select(bwt, low) |>
tbl_summary(
by = low,
statistic = bwt~"{mean}"
) |>
as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| bwt | 3,329 | 2,097 |
| Mean | ||
df |>
select(bwt, low) |>
tbl_summary(
by = low,
statistic = bwt~"{mean}, {sd}"
) |>
as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| bwt | 3,329, 478 | 2,097, 391 |
| Mean, SD | ||
Adjust how the column name should be displayed. Provide the column name and its desired label separated by a tilde. The default is the column name. This is done with help of argument label = in tbl_summary function.
df |>
select(bwt, low) |>
tbl_summary(
by = low,
statistic = bwt~"{mean}, {sd}",
label = bwt ~ "Birth Weight"
) |> as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| Birth Weight | 3,329, 478 | 2,097, 391 |
| Mean, SD | ||
You can change labels of multiple variables by providing the labels as a list to the label argument.
df |>
select(bwt, low, smoke) |>
tbl_summary(
by = low,
statistic = bwt~"{mean}, {sd}",
label = list(bwt ~ "Birth Weight",
smoke ~ "Smoking history")) |> as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| Birth Weight | 3,329, 478 | 2,097, 391 |
| Smoking history | ||
| Non Smoker | 86 (66%) | 29 (49%) |
| Smoker | 44 (34%) | 30 (51%) |
| Mean, SD; n (%) | ||
Can we provide a list to the statistic argument also for customizing statistical output? Try it!
If you want to print multiple lines of statistics for variables, you can indicate this by setting the type = to “continuous2”. You can combine all of the previously shown elements in one table by choosing which statistics you want to show. To do this you need to tell the function that you want to get a table back by entering the type as continuous2.
df |>
select(bwt, low, smoke) |>
tbl_summary(
by = low,
type = bwt ~ "continuous2",
statistic = bwt~c(
"{mean}, {sd}",
"{median}, ({p25}, {p75})"),
label = list(bwt ~ "Birth Weight",
smoke ~ "Smoking history")) |> as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| Birth Weight | ||
| Mean, SD | 3,329, 478 | 2,097, 391 |
| Median, (Q1, Q3) | 3,267, (2,948, 3,651) | 2,211, (1,928, 2,410) |
| Smoking history | ||
| Non Smoker | 86 (66%) | 29 (49%) |
| Smoker | 44 (34%) | 30 (51%) |
| n (%) | ||
If you wish to print multiline output for all continuous variables, instead of providing “continuous2” argument specified by name of the variable, use continous() in type and statisticarguments.
df |>
select(bwt, low, smoke, lwt) |>
tbl_summary(
by = low,
type = all_continuous() ~ "continuous2",
statistic = all_continuous()~c(
"{mean}, {sd}",
"{median}, ({p25}, {p75})"),
label = list(bwt ~ "Birth Weight",
smoke ~ "Smoking history")) |> as_hux_table()Characteristic | Normal | Low Birth Weight |
|---|---|---|
| Birth Weight | ||
| Mean, SD | 3,329, 478 | 2,097, 391 |
| Median, (Q1, Q3) | 3,267, (2,948, 3,651) | 2,211, (1,928, 2,410) |
| Smoking history | ||
| Non Smoker | 86 (66%) | 29 (49%) |
| Smoker | 44 (34%) | 30 (51%) |
| lwt | ||
| Mean, SD | 133, 32 | 122, 27 |
| Median, (Q1, Q3) | 124, (113, 147) | 120, (103, 130) |
| n (%) | ||
The type argument in tbl_summary function is an optional argument which includes details for the customized outputs according to the type of variables.
df |>
select(bwt, low, smoke) |>
tbl_summary(type = all_continuous() ~ "continuous2",
statistic = list(all_continuous() ~ c(
"{mean} ({sd})",
"{median} ({p25}, {p75})"),
all_categorical() ~ "{n} ({p}%)"),
digits = all_continuous() ~ 1) |> # setting for decimal points
as_hux_table()Characteristic | N = 189 |
|---|---|
| bwt | |
| Mean (SD) | 2,944.6 (729.2) |
| Median (Q1, Q3) | 2,977.0 (2,414.0, 3,487.0) |
| low | |
| Normal | 130 (69%) |
| Low Birth Weight | 59 (31%) |
| smoke | |
| Non Smoker | 115 (61%) |
| Smoker | 74 (39%) |
| n (%) | |
Tip
Having a reproducible code works wonders! Try writing this reproducible code for your own tidy dataset. Voila! You will have publication ready tables.
Compare the difference in means for a continuous variable in two groups. add_p()function from gtsummarypackage adds p-values to gtsummary table
Illustrative example: t test
df |>
select(bwt, smoke) |>
tbl_summary(by = smoke) |>
add_p(bwt ~ "t.test") |>
as_hux_table()Characteristic | Non Smoker | Smoker | p-value |
|---|---|---|---|
| bwt | 3,100 (2,495, 3,629) | 2,776 (2,367, 3,260) | 0.007 |
| Median (Q1, Q3) | |||
| Welch Two Sample t-test | |||
What happens if we do not pass any argument to function? Try it!
add_p() function.To find the list of tests available internally within gtsummary, type ?gtsummary::tests in your console. What do you see? There are tbl_summary() variants as well as add_difference variant. Refer to gtsummary vignettes available at https://cran.r-project.org/web/packages/gtsummary/index.html for more details.
Illustrative example. A researcher is interested to know whether there is a significant difference in mean birth weight as well as proportion of low birth weight babies among mothers with history of smoking during pregnancy as compared to those without history of smoking during pregnancy.
To answer the question for this study, the summary statistics should be grouped by smoking history group, which can be done by using the by= argument. To compare two or more groups, include add_p() with the function, which detects variable type and uses an appropriate statistical test.
df |>
select(smoke, bwt, low) |>
tbl_summary(by = smoke) |>
add_p() |>
as_hux_table()Characteristic | Non Smoker | Smoker | p-value |
|---|---|---|---|
| bwt | 3,100 (2,495, 3,629) | 2,776 (2,367, 3,260) | 0.007 |
| low | 0.026 | ||
| Normal | 86 (75%) | 44 (59%) | |
| Low Birth Weight | 29 (25%) | 30 (41%) | |
| Median (Q1, Q3); n (%) | |||
| Wilcoxon rank sum test; Pearson's Chi-squared test | |||
We have introduced you to one of the most powerful package currently used to develop publication ready tables. It might seem that there is a lot to learn for making publication ready tables using R. Initially, one might feel unnecessary to learn these syntax. However, if you seriously are into creating tables in your professional career, it is recommended and worth investing time to learn these syntax. We are confident your initial effort will save a lot of time subsequently and make you more efficient and accurate in future. Best wishes!