library(dplyr)
library(tibble)
library(ggplot2)
library(knitr)
Welcome
This vignette documents the suppress() function for
hierarchical cell suppression and release-friendly display values.
The function is designed for tables where the same information appears at multiple reporting levels, such as:
- total counts,
- one-way margins like sex or race,
- two-way breakdowns like sex within race or race within sex,
- counts paired with proportions.
The core idea is simple:
- identify cells that must be hidden because they are too small;
- add secondary suppressions when necessary so the hidden values cannot be backed out from margins;
- propagate suppression downward when an ancestor level is already hidden;
- return display-ready columns for counts and proportions, along with labels and suppression status indicators.
Overview
What the function does
suppress() takes a data frame and a spec
object describing the reporting hierarchy.
For each reporting level, it can:
- apply primary suppression to small cells;
- apply secondary suppression to prevent recovery by subtraction;
- propagate suppression from broader levels to finer levels;
- produce numeric display columns and character label columns;
- optionally overwrite the original count and proportion columns with the released display values.
This makes it useful for both:
- analytic checking, because the returned data still contains suppression metadata; and
- direct reporting, because it returns columns like
display_*andlabel_*that can be fed straight into tables orggplot2.
What counts as a “level”
A level is one reporting granularity, identified by its
by variables.
Examples:
- total:
by = NULL - sex:
by = "sex" - race:
by = "race" - sex within race:
by = c("sex", "race")
Each level must have a unique by signature.
Function signature
suppress(
df,
spec,
min_n = 10,
upper_bound = TRUE,
overwrite = FALSE,
suppress_zeros = FALSE,
secondary_suppression = TRUE,
secondary_method = c("auto", "exact", "greedy"),
secondary_prefer_positive = TRUE,
secondary_exact_max_cells = 20,
count_big_mark = ",",
prop_digits = 1
)
The spec object
Basic grammar
spec is a named list with one entry per reporting
level.
A level can be given in two ways.
Shorthand form
Use this when the level is just a count column.
spec <- list(
total = "n_total"
)
Full form
Use this when the level also has a proportion, denominator, and grouping variables.
spec <- list(
sex = list(
count = "n_sex",
prop = "prop_sex",
denom = "n_total",
by = "sex"
)
)
Required and optional components
For each level:
countis required.propis optional.denomis optional, but required ifpropis supplied.byis optional; if omitted orNULL, the level is treated as total.
Important rule
Each level must have a distinct by definition.
So this is valid:
spec <- list(
total = "n_total",
sex = list(count = "n_sex", by = "sex"),
race = list(count = "n_race", by = "race"),
sex_x_race = list(count = "n_sex_x_race", by = c("sex", "race"))
)
But two different levels cannot both use the same
by.
Arguments
Core arguments
| Argument | Meaning |
|---|---|
df |
Input data frame containing counts, proportions, denominators, and grouping variables. |
spec |
Named list describing each reporting level. |
min_n |
Threshold for primary suppression. |
upper_bound |
Whether suppressed cells should keep an upper-bound style numeric display value when a denominator is available. |
overwrite |
Whether to overwrite the original count / proportion columns with display values. |
suppress_zeros |
Whether zero cells should be suppressed when they fall below
min_n. |
Secondary suppression controls
| Argument | Meaning |
|---|---|
secondary_suppression |
Whether to add secondary suppressions. |
secondary_method |
"auto", "exact", or
"greedy". |
secondary_prefer_positive |
If TRUE, zero cells are less preferred as secondary
suppressions. |
secondary_exact_max_cells |
Maximum candidate set size for the exact search. |
Formatting controls
| Argument | Meaning |
|---|---|
count_big_mark |
Thousands separator for count labels. |
prop_digits |
Number of decimal places in percentage labels. |
Suppression logic
Primary suppression
Primary suppression is applied to cells with counts below
min_n.
With suppress_zeros = FALSE, the rule is:
- suppress counts where
0 < n < min_n
With suppress_zeros = TRUE, the rule is:
- suppress counts where
0 <= n < min_n
Secondary suppression
Secondary suppression is used when one suppressed cell could be recovered from the remaining released cells in a partition.
For example, if a row total is known and exactly one cell in that row is suppressed, then that hidden cell can be recovered by subtraction. In that case, the function suppresses at least one additional cell.
Methods
"exact"searches over candidate suppressions and finds a minimum-cost valid set."greedy"adds suppressions iteratively using a coverage-then-cost rule."auto"tries exact first and falls back to greedy when the candidate set is too large.
Propagated suppression
If a broader level is suppressed, finer levels descending from that level can also be masked.
For example, if a race total is suppressed, then cells nested under that race category may also need to be hidden.
Upper-bound display values
When upper_bound = TRUE and a denominator exists, the
function does not reveal the true suppressed count.
Instead, it computes the residual:
denominator - sum(released counts in the same partition)
That residual is inserted into the numeric display_*
column, while the label column remains "*".
This is useful when you want plots to preserve approximate bar heights while still marking the released value as suppressed.
When upper_bound = FALSE, suppressed numeric displays
are set to NA.
Returned columns
For each level lvl, the function adds some or all of the
following:
| Column | Meaning |
|---|---|
display_n_<lvl> |
Numeric display count. |
label_n_<lvl> |
Character label for count. |
display_prop_<lvl> |
Numeric display proportion, if prop is defined. |
label_prop_<lvl> |
Character label for proportion, if prop is
defined. |
status_<lvl> |
One of "reported", "primary",
"secondary", or "propagated". |
suppressed_<lvl> |
Logical indicator for suppression at that level. |
Status meanings
| Status | Meaning |
|---|---|
reported |
Released without suppression. |
primary |
Suppressed because the cell itself is below threshold. |
secondary |
Suppressed to prevent recovery of another hidden cell. |
propagated |
Suppressed because an ancestor level was suppressed. |
Minimal example
This first example shows only totals and one-way margins.
df_min <- tibble(
sex = c("Female", "Male"),
race = c("All", "All"),
n_total = c(100, 100),
n_sex = c(50, 50),
prop_sex = c(0.50, 0.50)
)
spec_min <- list(
total = "n_total",
sex = list(
count = "n_sex",
prop = "prop_sex",
denom = "n_total",
by = "sex"
)
)
out_min <- suppress(
df = df_min,
spec = spec_min,
min_n = 10,
upper_bound = TRUE
)
out_min |>
distinct(
sex,
display_n_sex,
label_n_sex,
display_prop_sex,
label_prop_sex,
status_sex,
suppressed_sex
) |>
kable()
| sex | display_n_sex | label_n_sex | display_prop_sex | label_prop_sex | status_sex | suppressed_sex |
|---|---|---|---|---|---|---|
| Female | 50 | 50 | 0.5 | 50.0% | reported | FALSE |
| Male | 50 | 50 | 0.5 | 50.0% | reported | FALSE |
Example 1: sex within race
This example treats the two-way table as sex within
race, so the denominator for the two-way proportion is
n_race.
Data
df1 <- tibble(
sex = c(
"Female", "Female", "Female",
"Male", "Male", "Male"
),
race = c(
"White", "Black", "Asian",
"White", "Black", "Asian"
),
n_sex_x_race = c(42, 8, 0, 35, 6, 9),
n_sex = c(50, 50, 50, 50, 50, 50),
n_race = c(77, 14, 9, 77, 14, 9),
n_total = c(100, 100, 100, 100, 100, 100),
prop_sex_x_race = c(42 / 77, 8 / 14, 0 / 9, 35 / 77, 6 / 14, 9 / 9),
prop_sex = c(50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100),
prop_race = c(77 / 100, 14 / 100, 9 / 100, 77 / 100, 14 / 100, 9 / 100)
)
spec1 <- list(
total = "n_total",
sex = list(
count = "n_sex",
prop = "prop_sex",
denom = "n_total",
by = "sex"
),
race = list(
count = "n_race",
prop = "prop_race",
denom = "n_total",
by = "race"
),
sex_x_race = list(
count = "n_sex_x_race",
prop = "prop_sex_x_race",
denom = "n_race",
by = c("sex", "race")
)
)
Apply suppression
out1 <- suppress(
df = df1,
spec = spec1,
min_n = 10,
upper_bound = TRUE,
suppress_zeros = FALSE
)
Inspect released values
out1 |>
distinct(
sex,
race,
display_n_sex_x_race,
label_n_sex_x_race,
display_prop_sex_x_race,
label_prop_sex_x_race,
status_sex_x_race,
suppressed_sex_x_race
) |>
arrange(race, sex) |>
kable()
| sex | race | display_n_sex_x_race | label_n_sex_x_race | display_prop_sex_x_race | label_prop_sex_x_race | status_sex_x_race | suppressed_sex_x_race |
|---|---|---|---|---|---|---|---|
| Female | Asian | 9 | * | 1.0000000 | * | secondary | TRUE |
| Male | Asian | 9 | * | 1.0000000 | * | primary | TRUE |
| Female | Black | 14 | * | 1.0000000 | * | primary | TRUE |
| Male | Black | 14 | * | 1.0000000 | * | primary | TRUE |
| Female | White | 42 | 42 | 0.5454545 | 54.5% | reported | FALSE |
| Male | White | 35 | 35 | 0.4545455 | 45.5% | reported | FALSE |
Plot: one-way race
out1 |>
distinct(race, display_prop_race, label_prop_race) |>
mutate(
text_y = if_else(
is.na(display_prop_race),
0.015,
pmax(display_prop_race + 0.015, 0.015)
)
) |>
ggplot(aes(x = race, y = display_prop_race)) +
geom_col(width = 0.8, na.rm = TRUE) +
geom_text(aes(y = text_y, label = label_prop_race), na.rm = TRUE) +
scale_y_continuous(
labels = function(x) paste0(round(100 * x), "%"),
expand = expansion(mult = c(0, 0.08))
) +
labs(
title = "Example 1: Race",
x = "Race",
y = "Percent of total"
) +
theme_minimal()
Plot: one-way sex
out1 |>
distinct(sex, display_prop_sex, label_prop_sex) |>
mutate(
text_y = if_else(
is.na(display_prop_sex),
0.015,
pmax(display_prop_sex + 0.015, 0.015)
)
) |>
ggplot(aes(x = sex, y = display_prop_sex)) +
geom_col(width = 0.8, na.rm = TRUE) +
geom_text(aes(y = text_y, label = label_prop_sex), na.rm = TRUE) +
scale_y_continuous(
labels = function(x) paste0(round(100 * x), "%"),
expand = expansion(mult = c(0, 0.08))
) +
labs(
title = "Example 1: Sex",
x = "Sex",
y = "Percent of total"
) +
theme_minimal()
Plot: sex within race
out1 |>
distinct(race, sex, display_prop_sex_x_race, label_prop_sex_x_race) |>
mutate(
text_y = if_else(
is.na(display_prop_sex_x_race),
0.015,
pmax(display_prop_sex_x_race + 0.015, 0.015)
)
) |>
ggplot(aes(x = race, y = display_prop_sex_x_race, fill = sex)) +
geom_col(position = position_dodge(width = 0.9), width = 0.8, na.rm = TRUE) +
geom_text(
aes(y = text_y, label = label_prop_sex_x_race),
position = position_dodge(width = 0.9),
na.rm = TRUE
) +
scale_y_continuous(
labels = function(x) paste0(round(100 * x), "%"),
expand = expansion(mult = c(0, 0.08))
) +
labs(
title = "Example 1: Sex within Race",
x = "Race",
y = "Percent within race",
fill = "Sex"
) +
theme_minimal()
What is happening here
In this example:
8,6, and9are belowmin_n = 10, so they trigger primary suppression.- because those cells sit inside partitions with known denominators, additional suppression may be required to prevent recovery by subtraction;
- since
upper_bound = TRUE, the numeric display columns may still show residual values, but the labels remain"*".
Example 2: race within sex
This example uses the same counts but changes the meaning of the
two-way proportion. Now the denominator is n_sex, so the
two-way table is interpreted as race within sex.
Data
df2 <- tibble(
sex = c(
"Female", "Female", "Female",
"Male", "Male", "Male"
),
race = c(
"White", "Black", "Asian",
"White", "Black", "Asian"
),
n_sex_x_race = c(42, 8, 0, 35, 6, 9),
n_sex = c(50, 50, 50, 50, 50, 50),
n_race = c(77, 14, 9, 77, 14, 9),
n_total = c(100, 100, 100, 100, 100, 100),
prop_sex_x_race = c(42 / 50, 8 / 50, 0 / 50, 35 / 50, 6 / 50, 9 / 50),
prop_sex = c(50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100),
prop_race = c(77 / 100, 14 / 100, 9 / 100, 77 / 100, 14 / 100, 9 / 100)
)
spec2 <- list(
total = "n_total",
sex = list(
count = "n_sex",
prop = "prop_sex",
denom = "n_total",
by = "sex"
),
race = list(
count = "n_race",
prop = "prop_race",
denom = "n_total",
by = "race"
),
sex_x_race = list(
count = "n_sex_x_race",
prop = "prop_sex_x_race",
denom = "n_sex",
by = c("sex", "race")
)
)
Apply suppression
out2 <- suppress(
df = df2,
spec = spec2,
min_n = 10,
upper_bound = TRUE,
suppress_zeros = FALSE
)
Inspect released values
out2 |>
distinct(
sex,
race,
display_n_sex_x_race,
label_n_sex_x_race,
display_prop_sex_x_race,
label_prop_sex_x_race,
status_sex_x_race,
suppressed_sex_x_race
) |>
arrange(sex, race) |>
kable()
| sex | race | display_n_sex_x_race | label_n_sex_x_race | display_prop_sex_x_race | label_prop_sex_x_race | status_sex_x_race | suppressed_sex_x_race |
|---|---|---|---|---|---|---|---|
| Female | Asian | 8 | * | 0.16 | * | secondary | TRUE |
| Female | Black | 8 | * | 0.16 | * | primary | TRUE |
| Female | White | 42 | 42 | 0.84 | 84.0% | reported | FALSE |
| Male | Asian | 15 | * | 0.30 | * | primary | TRUE |
| Male | Black | 15 | * | 0.30 | * | primary | TRUE |
| Male | White | 35 | 35 | 0.70 | 70.0% | reported | FALSE |
Plot: race within sex
out2 |>
distinct(sex, race, display_prop_sex_x_race, label_prop_sex_x_race) |>
mutate(
text_y = if_else(
is.na(display_prop_sex_x_race),
0.015,
pmax(display_prop_sex_x_race + 0.015, 0.015)
)
) |>
ggplot(aes(x = sex, y = display_prop_sex_x_race, fill = race)) +
geom_col(position = position_dodge(width = 0.9), width = 0.8, na.rm = TRUE) +
geom_text(
aes(y = text_y, label = label_prop_sex_x_race),
position = position_dodge(width = 0.9),
na.rm = TRUE
) +
scale_y_continuous(
labels = function(x) paste0(round(100 * x), "%"),
expand = expansion(mult = c(0, 0.08))
) +
labs(
title = "Example 2: Race within Sex",
x = "Sex",
y = "Percent within sex",
fill = "Race"
) +
theme_minimal()
Practical notes
When to use upper_bound = TRUE
Use upper_bound = TRUE when you want:
- bars to retain approximate height in plots,
- tables to carry a numeric display value for layout or ordering,
- labels to remain
"*"so readers know the cell is suppressed.
Use upper_bound = FALSE when you want suppressed cells
to disappear numerically.
When to use overwrite = TRUE
Set overwrite = TRUE when downstream code expects the
original columns to already contain released values.
For example:
released_df <- suppress(
df = df1,
spec = spec1,
overwrite = TRUE
)
Then released_df$n_sex_x_race and
released_df$prop_sex_x_race are replaced by display
values.
When to suppress zeros
Set suppress_zeros = TRUE when zeros themselves are
considered sensitive.
Leave it as FALSE when structural or observed zeroes are
acceptable to release.
Exact versus greedy secondary suppression
Use:
"exact"for smaller problems when you want the cleanest minimum-cost secondary set;"greedy"for larger problems when speed matters;"auto"as the default compromise.
Common pitfalls
1. prop without denom
This is invalid.
bad_spec <- list(
sex = list(
count = "n_sex",
prop = "prop_sex",
by = "sex"
)
)
If you define prop, you must also define
denom.
2. Duplicate by signatures
This is also invalid.
bad_spec <- list(
a = list(count = "n1", by = "sex"),
b = list(count = "n2", by = "sex")
)
Each level must correspond to a unique reporting granularity.
3. Non-unique denominators within a partition
When upper_bound = TRUE, the denominator must be
uniquely defined within each partition used for that level. Otherwise
the function cannot compute a residual bound consistently.
Recommended workflow
A good pattern is:
- build your reporting data frame;
- define
spec; - run
suppress(); - use
display_*for numeric plotting; - use
label_*for text labels; - use
status_*andsuppressed_*for QA.
Example:
out <- suppress(df, spec)
plot_df <- out |>
distinct(group_var, display_prop_some_level, label_prop_some_level)
Summary
suppress() is a hierarchical suppression helper that
combines:
- primary suppression for small cells,
- secondary suppression for inference protection,
- propagation across reporting levels,
- display-ready outputs for counts and proportions.
Its main strength is that it separates:
- the true underlying values,
- the released numeric display values, and
- the human-facing labels.
That separation makes it suitable for publication pipelines,
dashboards, and ggplot2 workflows where disclosure control
and presentation need to coexist.
Leave a Reply