Targets Scripts

Moving to {targets} for Data Workflows

As of early 2026, the NSWERS research team is transitioning to the targets package in R for all research scripts.

This change reflects a shift toward more structured, reliable, and scalable analytical workflows.

Why this change?

As our projects have grown in complexity, so have the challenges of maintaining clear, consistent, and efficient workflows. Traditional script-based approaches make it difficult to:

Track dependencies between steps
Rerun only what has changed
Ensure consistent results across environments
Collaborate effectively on large projects

{targets} addresses these issues by organizing analyses as pipelines rather than linear scripts.

How `{targets}` works at a high level

At its core, {targets} turns your analysis into a directed acyclic graph (DAG) of steps.

Each node in the graph is a “target” – an object you define
Each edge represents a dependency between targets

This structure can be visualized directly using tar_visnetwork(), which shows exactly how every piece of the pipeline depends on the others.

This is not just a visualization tool. It reflects how {targets} actually executes your workflow.

Each target is tracked using a checksum based on its code and inputs. When something changes, {targets} detects it and determines exactly which downstream targets are affected in the DAG.

When you run tar_make():

Targets with unchanged checksums are skipped
Only targets affected by the change are recomputed
All downstream dependencies are updated automatically

This is what enables efficient recomputation. Instead of rerunning an entire script, {targets} runs only the minimal portion of the workflow required to bring everything up to date.

Key benefits

Reproducibility by design
Because every dependency is explicitly encoded in the DAG, the entire workflow can be reproduced reliably across sessions and users.

Efficient recomputation
Changes propagate through the graph, and only affected targets are rerun. This is especially valuable for long-running pipelines.

Collaboration and clarity
The DAG makes dependencies explicit and visible. This improves readability, simplifies debugging, and allows multiple contributors to work independently without conflicts.

Scalability
Pipelines grow naturally with project size. {targets} supports parallel execution and integrates with distributed computing tools.

What this means in practice

Analyses are organized as pipelines rather than standalone scripts
Dependencies are explicit and visualizable
Changes trigger only the necessary updates
Outputs are easier to reproduce, validate, and share

How `{targets}` is structured

Every pipeline is defined by a _targets.R file in the project root. This file describes the full workflow and is what gets executed when the pipeline runs.

Importantly, {targets} does not execute this file in the same way as a typical script. Instead, it evaluates targets in isolated environments. This is what ensures reproducibility, but it also introduces a few patterns that require explanation.

Packages are loaded twice

In a {targets} pipeline, packages are typically loaded in two places:

At the top of _targets.R using library()
Inside tar_option_set(packages = ...)

These serve different roles.

Loading packages for pipeline definition
The library() calls make functions available while the pipeline is being defined, allowing R to construct the DAG.

Declaring packages for pipeline execution
The packages argument ensures those libraries are available when targets run inside their isolated environments.

Because targets do not inherit your interactive session, all runtime dependencies must be declared explicitly. This ensures consistent behavior across users and systems.

library(tidyverse)
library(targets)
library(tarchetypes)tar_option_set(
  packages = c("tidyverse"),
  controller = crew::crew_controller_local(workers = 3)
)

Defining targets

Targets are defined as a list of objects using tar_target(name, command).

list(
  tar_target(
    adelie_penguins_lcl,
    penguins |> dplyr::filter(species == "Adelie")
  )
)

Each target becomes a node in the DAG. {targets} automatically tracks its dependencies and determines when it needs to be recomputed.

Organizing larger pipelines

To keep pipelines manageable, targets are often grouped into lists defined in separate scripts under an R/ directory and brought in with:

tar_source(here::here("R"))

The main pipeline then combines these groups:

list(
  students_targets,
  wf_targets,
  summary_industry_targets,
  reports_targets
)

Working with database tables

Because {targets} relies on checksums, database tables require special handling. The {NSWERSutils} function make_table_hash_target() converts tables into trackable targets by computing hashes.

load_rdb_tables <- make_table_hash_target(
  c("k12_students", "k12_enrollment"),
  db_con_fun = env_con,
  .cue_mode.table_targets = "thorough",
  .cue_mode.tbl_targets = "thorough"
)

This allows database changes to propagate correctly through the DAG.

Handling figures

Plots follow a standardized pattern:

Plot objects begin with plot_
They are saved using save_chart()
Outputs are written to figures/png, figures/pdf, and figures/svg

These outputs are also tracked as targets, so any upstream change triggers regeneration.

Running and inspecting a pipeline

Once a pipeline is defined, the primary way to execute it is with:

tar_make()

This function builds the pipeline by traversing the DAG.

Under the hood, {targets}:

Evaluates each target in the correct dependency order
Checks whether each target is up to date using its checksum
Skips targets that have not changed
Recomputes only those that are outdated, along with anything that depends on them

In practice, this means you can rerun tar_make() at any time without worrying about wasted computation. The pipeline will only do the work that is necessary to bring everything up to date.

Inspecting results during development

Because targets are stored in an internal data store rather than your global environment, interacting with them during development is slightly different from standard R workflows.

There are two primary functions for accessing target outputs:

tar_read() – inspect without loading
tar_read(target_name) returns the value of a target without placing it in your global environment. This is useful for quickly checking results or debugging without modifying your workspace.

tar_load() – bring into your environment
tar_load(target_name) loads the target into your global environment under its original name. This is useful when you want to work with an object interactively, for example when exploring results or building downstream analyses.

Typical workflow

A common development loop looks like:

Modify code for a target or its dependencies
Run tar_make() to update the pipeline
Use tar_read() to quickly inspect outputs
Use tar_load() when deeper interactive work is needed

This workflow keeps the pipeline as the source of truth while still allowing flexible exploration during development.

Example pipeline

A simplified _targets.R file might look like this:

library(tidyverse)
library(NSWERSutils)
library(NSWERSthemes)
library(scales)
library(targets)
library(tarchetypes)


here::i_am("_targets.R")

tar_source(here::here("R"))

tar_option_set(
  packages = c("tidyverse", "NSWERSutils"),
  controller = crew::crew_controller_local(workers = 3)
)

load_rdb_tables <- make_table_hash_target(
  c("wh_student"),
  db_con_fun = dev_con
)

list(

  load_rdb_tables,

  # create target for proportion of gender in db
  tar_target(prop_gender_lcl, 
               wh_student_tbl |>
               distinct(nswers_id, gender) |>
               filter(gender %in% c("M","F")) |>
               mutate(n_total = as.numeric(n())) |>
               group_by(gender) |>
               summarize(n_gender=as.numeric(n()), .groups = "drop") |>
               mutate(total_n = sum(n_gender)) |>
               mutate(prop_gender = as.numeric(n_gender / total_n)) |>
               collect()
  ),

  # create target for plot
  tar_target(plot_prop_gender,
               prop_gender_lcl |>
                 ggplot(aes(x = gender, y = prop_gender, fill = gender)) +
                 geom_col(width = 0.6) +
                 geom_text(
                   aes(label = percent(prop_gender, accuracy = 0.1)),
                   vjust = -0.3
                 ) +
                 scale_y_continuous(labels = percent_format(accuracy = 1)) +
                 labs(
                   title = "Gender Distribution of Students",
                   x = NULL,
                   y = "Percent of Students"
                  ) +
                  theme_nswers_default()
  ),

  # save plot as png, pdf, svg
  save_chart(plot_prop_gender, here::here("figures"))
  
)

Summary

{targets} replaces script-based workflows with a dependency-aware pipeline system:

Work is organized as a DAG of targets
Changes are detected via checksums
Only affected components are recomputed
Dependencies are transparent and visualizable

This approach improves reproducibility, efficiency, and collaboration as our work continues to scale.