library(dplyr)
library(tibble)
library(ggplot2)
library(knitr)

Welcome

This vignette documents the suppress() function for hierarchical cell suppression and release-friendly display values.

The function is designed for tables where the same information appears at multiple reporting levels, such as:

total counts,
one-way margins like sex or race,
two-way breakdowns like sex within race or race within sex,
counts paired with proportions.

The core idea is simple:

identify cells that must be hidden because they are too small;
add secondary suppressions when necessary so the hidden values cannot be backed out from margins;
propagate suppression downward when an ancestor level is already hidden;
return display-ready columns for counts and proportions, along with labels and suppression status indicators.

Overview

What the function does

suppress() takes a data frame and a spec object describing the reporting hierarchy.

For each reporting level, it can:

apply primary suppression to small cells;
apply secondary suppression to prevent recovery by subtraction;
propagate suppression from broader levels to finer levels;
produce numeric display columns and character label columns;
optionally overwrite the original count and proportion columns with the released display values.

This makes it useful for both:

analytic checking, because the returned data still contains suppression metadata; and
direct reporting, because it returns columns like display_* and label_* that can be fed straight into tables or ggplot2.

What counts as a “level”

A level is one reporting granularity, identified by its by variables.

Examples:

total: by = NULL
sex: by = "sex"
race: by = "race"
sex within race: by = c("sex", "race")

Each level must have a unique by signature.

Function signature

suppress(
  df,
  spec,
  min_n = 10,
  upper_bound = TRUE,
  overwrite = FALSE,
  suppress_zeros = FALSE,
  secondary_suppression = TRUE,
  secondary_method = c("auto", "exact", "greedy"),
  secondary_prefer_positive = TRUE,
  secondary_exact_max_cells = 20,
  count_big_mark = ",",
  prop_digits = 1
)

The `spec` object

Basic grammar

spec is a named list with one entry per reporting level.

A level can be given in two ways.

Shorthand form

Use this when the level is just a count column.

spec <- list(
  total = "n_total"
)

Full form

Use this when the level also has a proportion, denominator, and grouping variables.

spec <- list(
  sex = list(
    count = "n_sex",
    prop = "prop_sex",
    denom = "n_total",
    by = "sex"
  )
)

Required and optional components

For each level:

count is required.
prop is optional.
denom is optional, but required if prop is supplied.
by is optional; if omitted or NULL, the level is treated as total.

Important rule

Each level must have a distinct by definition.

So this is valid:

spec <- list(
  total = "n_total",
  sex = list(count = "n_sex", by = "sex"),
  race = list(count = "n_race", by = "race"),
  sex_x_race = list(count = "n_sex_x_race", by = c("sex", "race"))
)

But two different levels cannot both use the same by.

Arguments

Core arguments

Argument	Meaning
`df`	Input data frame containing counts, proportions, denominators, and grouping variables.
`spec`	Named list describing each reporting level.
`min_n`	Threshold for primary suppression.
`upper_bound`	Whether suppressed cells should keep an upper-bound style numeric display value when a denominator is available.
`overwrite`	Whether to overwrite the original count / proportion columns with display values.
`suppress_zeros`	Whether zero cells should be suppressed when they fall below `min_n`.

Secondary suppression controls

Argument	Meaning
`secondary_suppression`	Whether to add secondary suppressions.
`secondary_method`	`"auto"`, `"exact"`, or `"greedy"`.
`secondary_prefer_positive`	If `TRUE`, zero cells are less preferred as secondary suppressions.
`secondary_exact_max_cells`	Maximum candidate set size for the exact search.

Formatting controls

Argument	Meaning
`count_big_mark`	Thousands separator for count labels.
`prop_digits`	Number of decimal places in percentage labels.

Suppression logic

Primary suppression

Primary suppression is applied to cells with counts below min_n.

With suppress_zeros = FALSE, the rule is:

suppress counts where 0 < n < min_n

With suppress_zeros = TRUE, the rule is:

suppress counts where 0 <= n < min_n

Secondary suppression

Secondary suppression is used when one suppressed cell could be recovered from the remaining released cells in a partition.

For example, if a row total is known and exactly one cell in that row is suppressed, then that hidden cell can be recovered by subtraction. In that case, the function suppresses at least one additional cell.

Methods

"exact" searches over candidate suppressions and finds a minimum-cost valid set.
"greedy" adds suppressions iteratively using a coverage-then-cost rule.
"auto" tries exact first and falls back to greedy when the candidate set is too large.

Propagated suppression

If a broader level is suppressed, finer levels descending from that level can also be masked.

For example, if a race total is suppressed, then cells nested under that race category may also need to be hidden.

Upper-bound display values

When upper_bound = TRUE and a denominator exists, the function does not reveal the true suppressed count. Instead, it computes the residual:

denominator - sum(released counts in the same partition)

That residual is inserted into the numeric display_* column, while the label column remains "*".

This is useful when you want plots to preserve approximate bar heights while still marking the released value as suppressed.

When upper_bound = FALSE, suppressed numeric displays are set to NA.

Returned columns

For each level lvl, the function adds some or all of the following:

Column	Meaning
`display_n_<lvl>`	Numeric display count.
`label_n_<lvl>`	Character label for count.
`display_prop_<lvl>`	Numeric display proportion, if `prop` is defined.
`label_prop_<lvl>`	Character label for proportion, if `prop` is defined.
`status_<lvl>`	One of `"reported"`, `"primary"`, `"secondary"`, or `"propagated"`.
`suppressed_<lvl>`	Logical indicator for suppression at that level.

Status meanings

Status	Meaning
`reported`	Released without suppression.
`primary`	Suppressed because the cell itself is below threshold.
`secondary`	Suppressed to prevent recovery of another hidden cell.
`propagated`	Suppressed because an ancestor level was suppressed.

Minimal example

This first example shows only totals and one-way margins.

df_min <- tibble(
  sex = c("Female", "Male"),
  race = c("All", "All"),
  n_total = c(100, 100),
  n_sex = c(50, 50),
  prop_sex = c(0.50, 0.50)
)

spec_min <- list(
  total = "n_total",
  sex = list(
    count = "n_sex",
    prop = "prop_sex",
    denom = "n_total",
    by = "sex"
  )
)

out_min <- suppress(
  df = df_min,
  spec = spec_min,
  min_n = 10,
  upper_bound = TRUE
)

out_min |>
  distinct(
    sex,
    display_n_sex,
    label_n_sex,
    display_prop_sex,
    label_prop_sex,
    status_sex,
    suppressed_sex
  ) |>
  kable()

sex	display_n_sex	label_n_sex	display_prop_sex	label_prop_sex	status_sex	suppressed_sex
Female	50	50	0.5	50.0%	reported	FALSE
Male	50	50	0.5	50.0%	reported	FALSE

Example 1: sex within race

This example treats the two-way table as sex within race, so the denominator for the two-way proportion is n_race.

Data

df1 <- tibble(
  sex = c(
    "Female", "Female", "Female",
    "Male",   "Male",   "Male"
  ),
  race = c(
    "White", "Black", "Asian",
    "White", "Black", "Asian"
  ),
  n_sex_x_race = c(42, 8, 0, 35, 6, 9),
  n_sex = c(50, 50, 50, 50, 50, 50),
  n_race = c(77, 14, 9, 77, 14, 9),
  n_total = c(100, 100, 100, 100, 100, 100),
  prop_sex_x_race = c(42 / 77, 8 / 14, 0 / 9, 35 / 77, 6 / 14, 9 / 9),
  prop_sex = c(50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100),
  prop_race = c(77 / 100, 14 / 100, 9 / 100, 77 / 100, 14 / 100, 9 / 100)
)

spec1 <- list(
  total = "n_total",
  sex = list(
    count = "n_sex",
    prop = "prop_sex",
    denom = "n_total",
    by = "sex"
  ),
  race = list(
    count = "n_race",
    prop = "prop_race",
    denom = "n_total",
    by = "race"
  ),
  sex_x_race = list(
    count = "n_sex_x_race",
    prop = "prop_sex_x_race",
    denom = "n_race",
    by = c("sex", "race")
  )
)

Apply suppression

out1 <- suppress(
  df = df1,
  spec = spec1,
  min_n = 10,
  upper_bound = TRUE,
  suppress_zeros = FALSE
)

Inspect released values

out1 |>
  distinct(
    sex,
    race,
    display_n_sex_x_race,
    label_n_sex_x_race,
    display_prop_sex_x_race,
    label_prop_sex_x_race,
    status_sex_x_race,
    suppressed_sex_x_race
  ) |>
  arrange(race, sex) |>
  kable()

sex	race	display_n_sex_x_race	label_n_sex_x_race	display_prop_sex_x_race	label_prop_sex_x_race	status_sex_x_race	suppressed_sex_x_race
Female	Asian	9	*	1.0000000	*	secondary	TRUE
Male	Asian	9	*	1.0000000	*	primary	TRUE
Female	Black	14	*	1.0000000	*	primary	TRUE
Male	Black	14	*	1.0000000	*	primary	TRUE
Female	White	42	42	0.5454545	54.5%	reported	FALSE
Male	White	35	35	0.4545455	45.5%	reported	FALSE

Plot: one-way race

out1 |>
  distinct(race, display_prop_race, label_prop_race) |>
  mutate(
    text_y = if_else(
      is.na(display_prop_race),
      0.015,
      pmax(display_prop_race + 0.015, 0.015)
    )
  ) |>
  ggplot(aes(x = race, y = display_prop_race)) +
  geom_col(width = 0.8, na.rm = TRUE) +
  geom_text(aes(y = text_y, label = label_prop_race), na.rm = TRUE) +
  scale_y_continuous(
    labels = function(x) paste0(round(100 * x), "%"),
    expand = expansion(mult = c(0, 0.08))
  ) +
  labs(
    title = "Example 1: Race",
    x = "Race",
    y = "Percent of total"
  ) +
  theme_minimal()

Plot: one-way sex

out1 |>
  distinct(sex, display_prop_sex, label_prop_sex) |>
  mutate(
    text_y = if_else(
      is.na(display_prop_sex),
      0.015,
      pmax(display_prop_sex + 0.015, 0.015)
    )
  ) |>
  ggplot(aes(x = sex, y = display_prop_sex)) +
  geom_col(width = 0.8, na.rm = TRUE) +
  geom_text(aes(y = text_y, label = label_prop_sex), na.rm = TRUE) +
  scale_y_continuous(
    labels = function(x) paste0(round(100 * x), "%"),
    expand = expansion(mult = c(0, 0.08))
  ) +
  labs(
    title = "Example 1: Sex",
    x = "Sex",
    y = "Percent of total"
  ) +
  theme_minimal()

Plot: sex within race

out1 |>
  distinct(race, sex, display_prop_sex_x_race, label_prop_sex_x_race) |>
  mutate(
    text_y = if_else(
      is.na(display_prop_sex_x_race),
      0.015,
      pmax(display_prop_sex_x_race + 0.015, 0.015)
    )
  ) |>
  ggplot(aes(x = race, y = display_prop_sex_x_race, fill = sex)) +
  geom_col(position = position_dodge(width = 0.9), width = 0.8, na.rm = TRUE) +
  geom_text(
    aes(y = text_y, label = label_prop_sex_x_race),
    position = position_dodge(width = 0.9),
    na.rm = TRUE
  ) +
  scale_y_continuous(
    labels = function(x) paste0(round(100 * x), "%"),
    expand = expansion(mult = c(0, 0.08))
  ) +
  labs(
    title = "Example 1: Sex within Race",
    x = "Race",
    y = "Percent within race",
    fill = "Sex"
  ) +
  theme_minimal()

What is happening here

In this example:

8, 6, and 9 are below min_n = 10, so they trigger primary suppression.
because those cells sit inside partitions with known denominators, additional suppression may be required to prevent recovery by subtraction;
since upper_bound = TRUE, the numeric display columns may still show residual values, but the labels remain "*".

Example 2: race within sex

This example uses the same counts but changes the meaning of the two-way proportion. Now the denominator is n_sex, so the two-way table is interpreted as race within sex.

Data

df2 <- tibble(
  sex = c(
    "Female", "Female", "Female",
    "Male",   "Male",   "Male"
  ),
  race = c(
    "White", "Black", "Asian",
    "White", "Black", "Asian"
  ),
  n_sex_x_race = c(42, 8, 0, 35, 6, 9),
  n_sex = c(50, 50, 50, 50, 50, 50),
  n_race = c(77, 14, 9, 77, 14, 9),
  n_total = c(100, 100, 100, 100, 100, 100),
  prop_sex_x_race = c(42 / 50, 8 / 50, 0 / 50, 35 / 50, 6 / 50, 9 / 50),
  prop_sex = c(50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100),
  prop_race = c(77 / 100, 14 / 100, 9 / 100, 77 / 100, 14 / 100, 9 / 100)
)

spec2 <- list(
  total = "n_total",
  sex = list(
    count = "n_sex",
    prop = "prop_sex",
    denom = "n_total",
    by = "sex"
  ),
  race = list(
    count = "n_race",
    prop = "prop_race",
    denom = "n_total",
    by = "race"
  ),
  sex_x_race = list(
    count = "n_sex_x_race",
    prop = "prop_sex_x_race",
    denom = "n_sex",
    by = c("sex", "race")
  )
)

Apply suppression

out2 <- suppress(
  df = df2,
  spec = spec2,
  min_n = 10,
  upper_bound = TRUE,
  suppress_zeros = FALSE
)

Inspect released values

out2 |>
  distinct(
    sex,
    race,
    display_n_sex_x_race,
    label_n_sex_x_race,
    display_prop_sex_x_race,
    label_prop_sex_x_race,
    status_sex_x_race,
    suppressed_sex_x_race
  ) |>
  arrange(sex, race) |>
  kable()

sex	race	display_n_sex_x_race	label_n_sex_x_race	display_prop_sex_x_race	label_prop_sex_x_race	status_sex_x_race	suppressed_sex_x_race
Female	Asian	8	*	0.16	*	secondary	TRUE
Female	Black	8	*	0.16	*	primary	TRUE
Female	White	42	42	0.84	84.0%	reported	FALSE
Male	Asian	15	*	0.30	*	primary	TRUE
Male	Black	15	*	0.30	*	primary	TRUE
Male	White	35	35	0.70	70.0%	reported	FALSE

Plot: race within sex

out2 |>
  distinct(sex, race, display_prop_sex_x_race, label_prop_sex_x_race) |>
  mutate(
    text_y = if_else(
      is.na(display_prop_sex_x_race),
      0.015,
      pmax(display_prop_sex_x_race + 0.015, 0.015)
    )
  ) |>
  ggplot(aes(x = sex, y = display_prop_sex_x_race, fill = race)) +
  geom_col(position = position_dodge(width = 0.9), width = 0.8, na.rm = TRUE) +
  geom_text(
    aes(y = text_y, label = label_prop_sex_x_race),
    position = position_dodge(width = 0.9),
    na.rm = TRUE
  ) +
  scale_y_continuous(
    labels = function(x) paste0(round(100 * x), "%"),
    expand = expansion(mult = c(0, 0.08))
  ) +
  labs(
    title = "Example 2: Race within Sex",
    x = "Sex",
    y = "Percent within sex",
    fill = "Race"
  ) +
  theme_minimal()

Practical notes

When to use `upper_bound = TRUE`

Use upper_bound = TRUE when you want:

bars to retain approximate height in plots,
tables to carry a numeric display value for layout or ordering,
labels to remain "*" so readers know the cell is suppressed.

Use upper_bound = FALSE when you want suppressed cells to disappear numerically.

When to use `overwrite = TRUE`

Set overwrite = TRUE when downstream code expects the original columns to already contain released values.

For example:

released_df <- suppress(
  df = df1,
  spec = spec1,
  overwrite = TRUE
)

Then released_df$n_sex_x_race and released_df$prop_sex_x_race are replaced by display values.

When to suppress zeros

Set suppress_zeros = TRUE when zeros themselves are considered sensitive.

Leave it as FALSE when structural or observed zeroes are acceptable to release.

Exact versus greedy secondary suppression

Use:

"exact" for smaller problems when you want the cleanest minimum-cost secondary set;
"greedy" for larger problems when speed matters;
"auto" as the default compromise.

Common pitfalls

1. `prop` without `denom`

This is invalid.

bad_spec <- list(
  sex = list(
    count = "n_sex",
    prop = "prop_sex",
    by = "sex"
  )
)

If you define prop, you must also define denom.

2. Duplicate `by` signatures

This is also invalid.

bad_spec <- list(
  a = list(count = "n1", by = "sex"),
  b = list(count = "n2", by = "sex")
)

Each level must correspond to a unique reporting granularity.

3. Non-unique denominators within a partition

When upper_bound = TRUE, the denominator must be uniquely defined within each partition used for that level. Otherwise the function cannot compute a residual bound consistently.

Recommended workflow

A good pattern is:

build your reporting data frame;
define spec;
run suppress();
use display_* for numeric plotting;
use label_* for text labels;
use status_* and suppressed_* for QA.

Example:

out <- suppress(df, spec)

plot_df <- out |>
  distinct(group_var, display_prop_some_level, label_prop_some_level)

Summary

suppress() is a hierarchical suppression helper that combines:

primary suppression for small cells,
secondary suppression for inference protection,
propagation across reporting levels,
display-ready outputs for counts and proportions.

Its main strength is that it separates:

the true underlying values,
the released numeric display values, and
the human-facing labels.

That separation makes it suitable for publication pipelines, dashboards, and ggplot2 workflows where disclosure control and presentation need to coexist.

Suppression with suppress

Welcome

Overview

What the function does

What counts as a “level”

Function signature

The spec object

Basic grammar

Shorthand form

Full form

Required and optional components

Important rule

Arguments

Core arguments

Secondary suppression controls

Formatting controls

Suppression logic

Primary suppression

Secondary suppression

Methods

Propagated suppression

Upper-bound display values

Returned columns

Status meanings

Minimal example

Example 1: sex within race

Data

Apply suppression

Inspect released values

Plot: one-way race

Plot: one-way sex

Plot: sex within race

What is happening here

Example 2: race within sex

Data

Apply suppression

Inspect released values

Plot: race within sex

Practical notes

When to use upper_bound = TRUE

When to use overwrite = TRUE

When to suppress zeros

Exact versus greedy secondary suppression

Common pitfalls

1. prop without denom

2. Duplicate by signatures

3. Non-unique denominators within a partition

Recommended workflow

Summary

Leave a Reply Cancel reply

The `spec` object

When to use `upper_bound = TRUE`

When to use `overwrite = TRUE`

1. `prop` without `denom`

2. Duplicate `by` signatures