Targets Scripts

Contents

    Moving to {targets} for Data Workflows

    As of early 2026, the NSWERS research team is transitioning to the targets package in R for all research scripts.

    This change reflects a shift toward more structured, reliable, and scalable analytical workflows.


    Why this change?

    As our projects have grown in complexity, so have the challenges of maintaining clear, consistent, and efficient workflows. Traditional script-based approaches make it difficult to:

    • Track dependencies between steps
    • Rerun only what has changed
    • Ensure consistent results across environments
    • Collaborate effectively on large projects

    {targets} addresses these issues by organizing analyses as pipelines rather than linear scripts.


    How {targets} works at a high level

    At its core, {targets} turns your analysis into a directed acyclic graph (DAG) of steps.

    • Each node in the graph is a “target” – an object you define
    • Each edge represents a dependency between targets

    This structure can be visualized directly using tar_visnetwork(), which shows exactly how every piece of the pipeline depends on the others.

    This is not just a visualization tool. It reflects how {targets} actually executes your workflow.

    Each target is tracked using a checksum based on its code and inputs. When something changes, {targets} detects it and determines exactly which downstream targets are affected in the DAG.

    When you run tar_make():

    • Targets with unchanged checksums are skipped
    • Only targets affected by the change are recomputed
    • All downstream dependencies are updated automatically

    This is what enables efficient recomputation. Instead of rerunning an entire script, {targets} runs only the minimal portion of the workflow required to bring everything up to date.


    Key benefits

    Reproducibility by design
    Because every dependency is explicitly encoded in the DAG, the entire workflow can be reproduced reliably across sessions and users.

    Efficient recomputation
    Changes propagate through the graph, and only affected targets are rerun. This is especially valuable for long-running pipelines.

    Collaboration and clarity
    The DAG makes dependencies explicit and visible. This improves readability, simplifies debugging, and allows multiple contributors to work independently without conflicts.

    Scalability
    Pipelines grow naturally with project size. {targets} supports parallel execution and integrates with distributed computing tools.


    What this means in practice

    • Analyses are organized as pipelines rather than standalone scripts
    • Dependencies are explicit and visualizable
    • Changes trigger only the necessary updates
    • Outputs are easier to reproduce, validate, and share

    How {targets} is structured

    Every pipeline is defined by a _targets.R file in the project root. This file describes the full workflow and is what gets executed when the pipeline runs.

    Importantly, {targets} does not execute this file in the same way as a typical script. Instead, it evaluates targets in isolated environments. This is what ensures reproducibility, but it also introduces a few patterns that require explanation.


    Packages are loaded twice

    In a {targets} pipeline, packages are typically loaded in two places:

    • At the top of _targets.R using library()
    • Inside tar_option_set(packages = ...)

    These serve different roles.

    Loading packages for pipeline definition
    The library() calls make functions available while the pipeline is being defined, allowing R to construct the DAG.

    Declaring packages for pipeline execution
    The packages argument ensures those libraries are available when targets run inside their isolated environments.

    Because targets do not inherit your interactive session, all runtime dependencies must be declared explicitly. This ensures consistent behavior across users and systems.

    library(tidyverse)
    library(targets)
    library(tarchetypes)tar_option_set(
    packages = c("tidyverse"),
    controller = crew::crew_controller_local(workers = 3)
    )

    Defining targets

    Targets are defined as a list of objects using tar_target(name, command).

    list(
    tar_target(
    adelie_penguins_lcl,
    penguins |> dplyr::filter(species == "Adelie")
    )
    )

    Each target becomes a node in the DAG. {targets} automatically tracks its dependencies and determines when it needs to be recomputed.


    Organizing larger pipelines

    To keep pipelines manageable, targets are often grouped into lists defined in separate scripts under an R/ directory and brought in with:

    tar_source(here::here("R"))

    The main pipeline then combines these groups:

    list(
    students_targets,
    wf_targets,
    summary_industry_targets,
    reports_targets
    )

    Working with database tables

    Because {targets} relies on checksums, database tables require special handling. The {NSWERSutils} function make_table_hash_target() converts tables into trackable targets by computing hashes.

    load_rdb_tables <- make_table_hash_target(
    c("k12_students", "k12_enrollment"),
    db_con_fun = env_con,
    .cue_mode.table_targets = "thorough",
    .cue_mode.tbl_targets = "thorough"
    )

    This allows database changes to propagate correctly through the DAG.


    Handling figures

    Plots follow a standardized pattern:

    • Plot objects begin with plot_
    • They are saved using save_chart()
    • Outputs are written to figures/png, figures/pdf, and figures/svg

    These outputs are also tracked as targets, so any upstream change triggers regeneration.


    Running and inspecting a pipeline

    Once a pipeline is defined, the primary way to execute it is with:

    tar_make()

    This function builds the pipeline by traversing the DAG.

    Under the hood, {targets}:

    • Evaluates each target in the correct dependency order
    • Checks whether each target is up to date using its checksum
    • Skips targets that have not changed
    • Recomputes only those that are outdated, along with anything that depends on them

    In practice, this means you can rerun tar_make() at any time without worrying about wasted computation. The pipeline will only do the work that is necessary to bring everything up to date.


    Inspecting results during development

    Because targets are stored in an internal data store rather than your global environment, interacting with them during development is slightly different from standard R workflows.

    There are two primary functions for accessing target outputs:

    tar_read() – inspect without loading
    tar_read(target_name) returns the value of a target without placing it in your global environment. This is useful for quickly checking results or debugging without modifying your workspace.

    tar_load() – bring into your environment
    tar_load(target_name) loads the target into your global environment under its original name. This is useful when you want to work with an object interactively, for example when exploring results or building downstream analyses.


    Typical workflow

    A common development loop looks like:

    • Modify code for a target or its dependencies
    • Run tar_make() to update the pipeline
    • Use tar_read() to quickly inspect outputs
    • Use tar_load() when deeper interactive work is needed

    This workflow keeps the pipeline as the source of truth while still allowing flexible exploration during development.

    Example pipeline

    A simplified _targets.R file might look like this:

    library(tidyverse)
    library(NSWERSutils)
    library(NSWERSthemes)
    library(scales)
    library(targets)
    library(tarchetypes)


    here::i_am("_targets.R")

    tar_source(here::here("R"))

    tar_option_set(
    packages = c("tidyverse", "NSWERSutils"),
    controller = crew::crew_controller_local(workers = 3)
    )

    load_rdb_tables <- make_table_hash_target(
    c("wh_student"),
    db_con_fun = dev_con
    )

    list(

    load_rdb_tables,

    # create target for proportion of gender in db
    tar_target(prop_gender_lcl,
    wh_student_tbl |>
    distinct(nswers_id, gender) |>
    filter(gender %in% c("M","F")) |>
    mutate(n_total = as.numeric(n())) |>
    group_by(gender) |>
    summarize(n_gender=as.numeric(n()), .groups = "drop") |>
    mutate(total_n = sum(n_gender)) |>
    mutate(prop_gender = as.numeric(n_gender / total_n)) |>
    collect()
    ),

    # create target for plot
    tar_target(plot_prop_gender,
    prop_gender_lcl |>
    ggplot(aes(x = gender, y = prop_gender, fill = gender)) +
    geom_col(width = 0.6) +
    geom_text(
    aes(label = percent(prop_gender, accuracy = 0.1)),
    vjust = -0.3
    ) +
    scale_y_continuous(labels = percent_format(accuracy = 1)) +
    labs(
    title = "Gender Distribution of Students",
    x = NULL,
    y = "Percent of Students"
    ) +
    theme_nswers_default()
    ),

    # save plot as png, pdf, svg
    save_chart(plot_prop_gender, here::here("figures"))

    )

    Summary

    {targets} replaces script-based workflows with a dependency-aware pipeline system:

    • Work is organized as a DAG of targets
    • Changes are detected via checksums
    • Only affected components are recomputed
    • Dependencies are transparent and visualizable

    This approach improves reproducibility, efficiency, and collaboration as our work continues to scale.

    Updated on March 17, 2026

    Leave a Reply

    Your email address will not be published. Required fields are marked *