Suppression with suppress

Contents
    library(dplyr)
    library(tibble)
    library(ggplot2)
    library(knitr)
    

    Welcome

    This vignette documents the suppress() function for hierarchical cell suppression and release-friendly display values.

    The function is designed for tables where the same information appears at multiple reporting levels, such as:

    • total counts,
    • one-way margins like sex or race,
    • two-way breakdowns like sex within race or race within sex,
    • counts paired with proportions.

    The core idea is simple:

    1. identify cells that must be hidden because they are too small;
    2. add secondary suppressions when necessary so the hidden values cannot be backed out from margins;
    3. propagate suppression downward when an ancestor level is already hidden;
    4. return display-ready columns for counts and proportions, along with labels and suppression status indicators.

    Overview

    What the function does

    suppress() takes a data frame and a spec object describing the reporting hierarchy.

    For each reporting level, it can:

    • apply primary suppression to small cells;
    • apply secondary suppression to prevent recovery by subtraction;
    • propagate suppression from broader levels to finer levels;
    • produce numeric display columns and character label columns;
    • optionally overwrite the original count and proportion columns with the released display values.

    This makes it useful for both:

    • analytic checking, because the returned data still contains suppression metadata; and
    • direct reporting, because it returns columns like display_* and label_* that can be fed straight into tables or ggplot2.

    What counts as a “level”

    A level is one reporting granularity, identified by its by variables.

    Examples:

    • total: by = NULL
    • sex: by = "sex"
    • race: by = "race"
    • sex within race: by = c("sex", "race")

    Each level must have a unique by signature.

    Function signature

    suppress(
      df,
      spec,
      min_n = 10,
      upper_bound = TRUE,
      overwrite = FALSE,
      suppress_zeros = FALSE,
      secondary_suppression = TRUE,
      secondary_method = c("auto", "exact", "greedy"),
      secondary_prefer_positive = TRUE,
      secondary_exact_max_cells = 20,
      count_big_mark = ",",
      prop_digits = 1
    )

    The spec object

    Basic grammar

    spec is a named list with one entry per reporting level.

    A level can be given in two ways.

    Shorthand form

    Use this when the level is just a count column.

    spec <- list(
      total = "n_total"
    )

    Full form

    Use this when the level also has a proportion, denominator, and grouping variables.

    spec <- list(
      sex = list(
        count = "n_sex",
        prop = "prop_sex",
        denom = "n_total",
        by = "sex"
      )
    )

    Required and optional components

    For each level:

    • count is required.
    • prop is optional.
    • denom is optional, but required if prop is supplied.
    • by is optional; if omitted or NULL, the level is treated as total.

    Important rule

    Each level must have a distinct by definition.

    So this is valid:

    spec <- list(
      total = "n_total",
      sex = list(count = "n_sex", by = "sex"),
      race = list(count = "n_race", by = "race"),
      sex_x_race = list(count = "n_sex_x_race", by = c("sex", "race"))
    )

    But two different levels cannot both use the same by.

    Arguments

    Core arguments

    Argument Meaning
    df Input data frame containing counts, proportions, denominators, and grouping variables.
    spec Named list describing each reporting level.
    min_n Threshold for primary suppression.
    upper_bound Whether suppressed cells should keep an upper-bound style numeric display value when a denominator is available.
    overwrite Whether to overwrite the original count / proportion columns with display values.
    suppress_zeros Whether zero cells should be suppressed when they fall below min_n.

    Secondary suppression controls

    Argument Meaning
    secondary_suppression Whether to add secondary suppressions.
    secondary_method "auto", "exact", or "greedy".
    secondary_prefer_positive If TRUE, zero cells are less preferred as secondary suppressions.
    secondary_exact_max_cells Maximum candidate set size for the exact search.

    Formatting controls

    Argument Meaning
    count_big_mark Thousands separator for count labels.
    prop_digits Number of decimal places in percentage labels.

    Suppression logic

    Primary suppression

    Primary suppression is applied to cells with counts below min_n.

    With suppress_zeros = FALSE, the rule is:

    • suppress counts where 0 < n < min_n

    With suppress_zeros = TRUE, the rule is:

    • suppress counts where 0 <= n < min_n

    Secondary suppression

    Secondary suppression is used when one suppressed cell could be recovered from the remaining released cells in a partition.

    For example, if a row total is known and exactly one cell in that row is suppressed, then that hidden cell can be recovered by subtraction. In that case, the function suppresses at least one additional cell.

    Methods

    • "exact" searches over candidate suppressions and finds a minimum-cost valid set.
    • "greedy" adds suppressions iteratively using a coverage-then-cost rule.
    • "auto" tries exact first and falls back to greedy when the candidate set is too large.

    Propagated suppression

    If a broader level is suppressed, finer levels descending from that level can also be masked.

    For example, if a race total is suppressed, then cells nested under that race category may also need to be hidden.

    Upper-bound display values

    When upper_bound = TRUE and a denominator exists, the function does not reveal the true suppressed count. Instead, it computes the residual:

    denominator - sum(released counts in the same partition)

    That residual is inserted into the numeric display_* column, while the label column remains "*".

    This is useful when you want plots to preserve approximate bar heights while still marking the released value as suppressed.

    When upper_bound = FALSE, suppressed numeric displays are set to NA.

    Returned columns

    For each level lvl, the function adds some or all of the following:

    Column Meaning
    display_n_<lvl> Numeric display count.
    label_n_<lvl> Character label for count.
    display_prop_<lvl> Numeric display proportion, if prop is defined.
    label_prop_<lvl> Character label for proportion, if prop is defined.
    status_<lvl> One of "reported", "primary", "secondary", or "propagated".
    suppressed_<lvl> Logical indicator for suppression at that level.

    Status meanings

    Status Meaning
    reported Released without suppression.
    primary Suppressed because the cell itself is below threshold.
    secondary Suppressed to prevent recovery of another hidden cell.
    propagated Suppressed because an ancestor level was suppressed.

    Minimal example

    This first example shows only totals and one-way margins.

    df_min <- tibble(
      sex = c("Female", "Male"),
      race = c("All", "All"),
      n_total = c(100, 100),
      n_sex = c(50, 50),
      prop_sex = c(0.50, 0.50)
    )
    
    spec_min <- list(
      total = "n_total",
      sex = list(
        count = "n_sex",
        prop = "prop_sex",
        denom = "n_total",
        by = "sex"
      )
    )
    
    out_min <- suppress(
      df = df_min,
      spec = spec_min,
      min_n = 10,
      upper_bound = TRUE
    )
    
    out_min |>
      distinct(
        sex,
        display_n_sex,
        label_n_sex,
        display_prop_sex,
        label_prop_sex,
        status_sex,
        suppressed_sex
      ) |>
      kable()
    sex display_n_sex label_n_sex display_prop_sex label_prop_sex status_sex suppressed_sex
    Female 50 50 0.5 50.0% reported FALSE
    Male 50 50 0.5 50.0% reported FALSE

    Example 1: sex within race

    This example treats the two-way table as sex within race, so the denominator for the two-way proportion is n_race.

    Data

    df1 <- tibble(
      sex = c(
        "Female", "Female", "Female",
        "Male",   "Male",   "Male"
      ),
      race = c(
        "White", "Black", "Asian",
        "White", "Black", "Asian"
      ),
      n_sex_x_race = c(42, 8, 0, 35, 6, 9),
      n_sex = c(50, 50, 50, 50, 50, 50),
      n_race = c(77, 14, 9, 77, 14, 9),
      n_total = c(100, 100, 100, 100, 100, 100),
      prop_sex_x_race = c(42 / 77, 8 / 14, 0 / 9, 35 / 77, 6 / 14, 9 / 9),
      prop_sex = c(50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100),
      prop_race = c(77 / 100, 14 / 100, 9 / 100, 77 / 100, 14 / 100, 9 / 100)
    )
    
    spec1 <- list(
      total = "n_total",
      sex = list(
        count = "n_sex",
        prop = "prop_sex",
        denom = "n_total",
        by = "sex"
      ),
      race = list(
        count = "n_race",
        prop = "prop_race",
        denom = "n_total",
        by = "race"
      ),
      sex_x_race = list(
        count = "n_sex_x_race",
        prop = "prop_sex_x_race",
        denom = "n_race",
        by = c("sex", "race")
      )
    )

    Apply suppression

    out1 <- suppress(
      df = df1,
      spec = spec1,
      min_n = 10,
      upper_bound = TRUE,
      suppress_zeros = FALSE
    )

    Inspect released values

    out1 |>
      distinct(
        sex,
        race,
        display_n_sex_x_race,
        label_n_sex_x_race,
        display_prop_sex_x_race,
        label_prop_sex_x_race,
        status_sex_x_race,
        suppressed_sex_x_race
      ) |>
      arrange(race, sex) |>
      kable()
    sex race display_n_sex_x_race label_n_sex_x_race display_prop_sex_x_race label_prop_sex_x_race status_sex_x_race suppressed_sex_x_race
    Female Asian 9 * 1.0000000 * secondary TRUE
    Male Asian 9 * 1.0000000 * primary TRUE
    Female Black 14 * 1.0000000 * primary TRUE
    Male Black 14 * 1.0000000 * primary TRUE
    Female White 42 42 0.5454545 54.5% reported FALSE
    Male White 35 35 0.4545455 45.5% reported FALSE

    Plot: one-way race

    out1 |>
      distinct(race, display_prop_race, label_prop_race) |>
      mutate(
        text_y = if_else(
          is.na(display_prop_race),
          0.015,
          pmax(display_prop_race + 0.015, 0.015)
        )
      ) |>
      ggplot(aes(x = race, y = display_prop_race)) +
      geom_col(width = 0.8, na.rm = TRUE) +
      geom_text(aes(y = text_y, label = label_prop_race), na.rm = TRUE) +
      scale_y_continuous(
        labels = function(x) paste0(round(100 * x), "%"),
        expand = expansion(mult = c(0, 0.08))
      ) +
      labs(
        title = "Example 1: Race",
        x = "Race",
        y = "Percent of total"
      ) +
      theme_minimal()

    Plot: one-way sex

    out1 |>
      distinct(sex, display_prop_sex, label_prop_sex) |>
      mutate(
        text_y = if_else(
          is.na(display_prop_sex),
          0.015,
          pmax(display_prop_sex + 0.015, 0.015)
        )
      ) |>
      ggplot(aes(x = sex, y = display_prop_sex)) +
      geom_col(width = 0.8, na.rm = TRUE) +
      geom_text(aes(y = text_y, label = label_prop_sex), na.rm = TRUE) +
      scale_y_continuous(
        labels = function(x) paste0(round(100 * x), "%"),
        expand = expansion(mult = c(0, 0.08))
      ) +
      labs(
        title = "Example 1: Sex",
        x = "Sex",
        y = "Percent of total"
      ) +
      theme_minimal()

    Plot: sex within race

    out1 |>
      distinct(race, sex, display_prop_sex_x_race, label_prop_sex_x_race) |>
      mutate(
        text_y = if_else(
          is.na(display_prop_sex_x_race),
          0.015,
          pmax(display_prop_sex_x_race + 0.015, 0.015)
        )
      ) |>
      ggplot(aes(x = race, y = display_prop_sex_x_race, fill = sex)) +
      geom_col(position = position_dodge(width = 0.9), width = 0.8, na.rm = TRUE) +
      geom_text(
        aes(y = text_y, label = label_prop_sex_x_race),
        position = position_dodge(width = 0.9),
        na.rm = TRUE
      ) +
      scale_y_continuous(
        labels = function(x) paste0(round(100 * x), "%"),
        expand = expansion(mult = c(0, 0.08))
      ) +
      labs(
        title = "Example 1: Sex within Race",
        x = "Race",
        y = "Percent within race",
        fill = "Sex"
      ) +
      theme_minimal()

    What is happening here

    In this example:

    • 8, 6, and 9 are below min_n = 10, so they trigger primary suppression.
    • because those cells sit inside partitions with known denominators, additional suppression may be required to prevent recovery by subtraction;
    • since upper_bound = TRUE, the numeric display columns may still show residual values, but the labels remain "*".

    Example 2: race within sex

    This example uses the same counts but changes the meaning of the two-way proportion. Now the denominator is n_sex, so the two-way table is interpreted as race within sex.

    Data

    df2 <- tibble(
      sex = c(
        "Female", "Female", "Female",
        "Male",   "Male",   "Male"
      ),
      race = c(
        "White", "Black", "Asian",
        "White", "Black", "Asian"
      ),
      n_sex_x_race = c(42, 8, 0, 35, 6, 9),
      n_sex = c(50, 50, 50, 50, 50, 50),
      n_race = c(77, 14, 9, 77, 14, 9),
      n_total = c(100, 100, 100, 100, 100, 100),
      prop_sex_x_race = c(42 / 50, 8 / 50, 0 / 50, 35 / 50, 6 / 50, 9 / 50),
      prop_sex = c(50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100, 50 / 100),
      prop_race = c(77 / 100, 14 / 100, 9 / 100, 77 / 100, 14 / 100, 9 / 100)
    )
    
    spec2 <- list(
      total = "n_total",
      sex = list(
        count = "n_sex",
        prop = "prop_sex",
        denom = "n_total",
        by = "sex"
      ),
      race = list(
        count = "n_race",
        prop = "prop_race",
        denom = "n_total",
        by = "race"
      ),
      sex_x_race = list(
        count = "n_sex_x_race",
        prop = "prop_sex_x_race",
        denom = "n_sex",
        by = c("sex", "race")
      )
    )

    Apply suppression

    out2 <- suppress(
      df = df2,
      spec = spec2,
      min_n = 10,
      upper_bound = TRUE,
      suppress_zeros = FALSE
    )

    Inspect released values

    out2 |>
      distinct(
        sex,
        race,
        display_n_sex_x_race,
        label_n_sex_x_race,
        display_prop_sex_x_race,
        label_prop_sex_x_race,
        status_sex_x_race,
        suppressed_sex_x_race
      ) |>
      arrange(sex, race) |>
      kable()
    sex race display_n_sex_x_race label_n_sex_x_race display_prop_sex_x_race label_prop_sex_x_race status_sex_x_race suppressed_sex_x_race
    Female Asian 8 * 0.16 * secondary TRUE
    Female Black 8 * 0.16 * primary TRUE
    Female White 42 42 0.84 84.0% reported FALSE
    Male Asian 15 * 0.30 * primary TRUE
    Male Black 15 * 0.30 * primary TRUE
    Male White 35 35 0.70 70.0% reported FALSE

    Plot: race within sex

    out2 |>
      distinct(sex, race, display_prop_sex_x_race, label_prop_sex_x_race) |>
      mutate(
        text_y = if_else(
          is.na(display_prop_sex_x_race),
          0.015,
          pmax(display_prop_sex_x_race + 0.015, 0.015)
        )
      ) |>
      ggplot(aes(x = sex, y = display_prop_sex_x_race, fill = race)) +
      geom_col(position = position_dodge(width = 0.9), width = 0.8, na.rm = TRUE) +
      geom_text(
        aes(y = text_y, label = label_prop_sex_x_race),
        position = position_dodge(width = 0.9),
        na.rm = TRUE
      ) +
      scale_y_continuous(
        labels = function(x) paste0(round(100 * x), "%"),
        expand = expansion(mult = c(0, 0.08))
      ) +
      labs(
        title = "Example 2: Race within Sex",
        x = "Sex",
        y = "Percent within sex",
        fill = "Race"
      ) +
      theme_minimal()

    Practical notes

    When to use upper_bound = TRUE

    Use upper_bound = TRUE when you want:

    • bars to retain approximate height in plots,
    • tables to carry a numeric display value for layout or ordering,
    • labels to remain "*" so readers know the cell is suppressed.

    Use upper_bound = FALSE when you want suppressed cells to disappear numerically.

    When to use overwrite = TRUE

    Set overwrite = TRUE when downstream code expects the original columns to already contain released values.

    For example:

    released_df <- suppress(
      df = df1,
      spec = spec1,
      overwrite = TRUE
    )

    Then released_df$n_sex_x_race and released_df$prop_sex_x_race are replaced by display values.

    When to suppress zeros

    Set suppress_zeros = TRUE when zeros themselves are considered sensitive.

    Leave it as FALSE when structural or observed zeroes are acceptable to release.

    Exact versus greedy secondary suppression

    Use:

    • "exact" for smaller problems when you want the cleanest minimum-cost secondary set;
    • "greedy" for larger problems when speed matters;
    • "auto" as the default compromise.

    Common pitfalls

    1. prop without denom

    This is invalid.

    bad_spec <- list(
      sex = list(
        count = "n_sex",
        prop = "prop_sex",
        by = "sex"
      )
    )

    If you define prop, you must also define denom.

    2. Duplicate by signatures

    This is also invalid.

    bad_spec <- list(
      a = list(count = "n1", by = "sex"),
      b = list(count = "n2", by = "sex")
    )

    Each level must correspond to a unique reporting granularity.

    3. Non-unique denominators within a partition

    When upper_bound = TRUE, the denominator must be uniquely defined within each partition used for that level. Otherwise the function cannot compute a residual bound consistently.

    Summary

    suppress() is a hierarchical suppression helper that combines:

    • primary suppression for small cells,
    • secondary suppression for inference protection,
    • propagation across reporting levels,
    • display-ready outputs for counts and proportions.

    Its main strength is that it separates:

    • the true underlying values,
    • the released numeric display values, and
    • the human-facing labels.

    That separation makes it suitable for publication pipelines, dashboards, and ggplot2 workflows where disclosure control and presentation need to coexist.

    Updated on April 21, 2026

    Leave a Reply

    Your email address will not be published. Required fields are marked *