Moving to {targets} for Data Workflows
As of early 2026, the NSWERS research team is transitioning to the targets package in R for all research scripts.
This change reflects a shift toward more structured, reliable, and scalable analytical workflows.
Why this change?
As our projects have grown in complexity, so have the challenges of maintaining clear, consistent, and efficient workflows. Traditional script-based approaches make it difficult to:
- Track dependencies between steps
- Rerun only what has changed
- Ensure consistent results across environments
- Collaborate effectively on large projects
{targets} addresses these issues by organizing analyses as pipelines rather than linear scripts.
How {targets} works at a high level
At its core, {targets} turns your analysis into a directed acyclic graph (DAG) of steps.
- Each node in the graph is a “target” – an object you define
- Each edge represents a dependency between targets
This structure can be visualized directly using tar_visnetwork(), which shows exactly how every piece of the pipeline depends on the others.
This is not just a visualization tool. It reflects how {targets} actually executes your workflow.
Each target is tracked using a checksum based on its code and inputs. When something changes, {targets} detects it and determines exactly which downstream targets are affected in the DAG.
When you run tar_make():
- Targets with unchanged checksums are skipped
- Only targets affected by the change are recomputed
- All downstream dependencies are updated automatically
This is what enables efficient recomputation. Instead of rerunning an entire script, {targets} runs only the minimal portion of the workflow required to bring everything up to date.
Key benefits
Reproducibility by design
Because every dependency is explicitly encoded in the DAG, the entire workflow can be reproduced reliably across sessions and users.
Efficient recomputation
Changes propagate through the graph, and only affected targets are rerun. This is especially valuable for long-running pipelines.
Collaboration and clarity
The DAG makes dependencies explicit and visible. This improves readability, simplifies debugging, and allows multiple contributors to work independently without conflicts.
Scalability
Pipelines grow naturally with project size. {targets} supports parallel execution and integrates with distributed computing tools.
What this means in practice
- Analyses are organized as pipelines rather than standalone scripts
- Dependencies are explicit and visualizable
- Changes trigger only the necessary updates
- Outputs are easier to reproduce, validate, and share
How {targets} is structured
Every pipeline is defined by a _targets.R file in the project root. This file describes the full workflow and is what gets executed when the pipeline runs.
Importantly, {targets} does not execute this file in the same way as a typical script. Instead, it evaluates targets in isolated environments. This is what ensures reproducibility, but it also introduces a few patterns that require explanation.
Packages are loaded twice
In a {targets} pipeline, packages are typically loaded in two places:
- At the top of
_targets.Rusinglibrary() - Inside
tar_option_set(packages = ...)
These serve different roles.
Loading packages for pipeline definition
The library() calls make functions available while the pipeline is being defined, allowing R to construct the DAG.
Declaring packages for pipeline execution
The packages argument ensures those libraries are available when targets run inside their isolated environments.
Because targets do not inherit your interactive session, all runtime dependencies must be declared explicitly. This ensures consistent behavior across users and systems.
library(tidyverse)
library(targets)
library(tarchetypes)tar_option_set(
packages = c("tidyverse"),
controller = crew::crew_controller_local(workers = 3)
)
Defining targets
Targets are defined as a list of objects using tar_target(name, command).
list(
tar_target(
adelie_penguins_lcl,
penguins |> dplyr::filter(species == "Adelie")
)
)
Each target becomes a node in the DAG. {targets} automatically tracks its dependencies and determines when it needs to be recomputed.
Organizing larger pipelines
To keep pipelines manageable, targets are often grouped into lists defined in separate scripts under an R/ directory and brought in with:
tar_source(here::here("R"))
The main pipeline then combines these groups:
list(
students_targets,
wf_targets,
summary_industry_targets,
reports_targets
)
Working with database tables
Because {targets} relies on checksums, database tables require special handling. The {NSWERSutils} function make_table_hash_target() converts tables into trackable targets by computing hashes.
load_rdb_tables <- make_table_hash_target(
c("k12_students", "k12_enrollment"),
db_con_fun = env_con,
.cue_mode.table_targets = "thorough",
.cue_mode.tbl_targets = "thorough"
)
This allows database changes to propagate correctly through the DAG.
Handling figures
Plots follow a standardized pattern:
- Plot objects begin with
plot_ - They are saved using
save_chart() - Outputs are written to
figures/png,figures/pdf, andfigures/svg
These outputs are also tracked as targets, so any upstream change triggers regeneration.
Running and inspecting a pipeline
Once a pipeline is defined, the primary way to execute it is with:
tar_make()
This function builds the pipeline by traversing the DAG.
Under the hood, {targets}:
- Evaluates each target in the correct dependency order
- Checks whether each target is up to date using its checksum
- Skips targets that have not changed
- Recomputes only those that are outdated, along with anything that depends on them
In practice, this means you can rerun tar_make() at any time without worrying about wasted computation. The pipeline will only do the work that is necessary to bring everything up to date.
Inspecting results during development
Because targets are stored in an internal data store rather than your global environment, interacting with them during development is slightly different from standard R workflows.
There are two primary functions for accessing target outputs:
tar_read() – inspect without loadingtar_read(target_name) returns the value of a target without placing it in your global environment. This is useful for quickly checking results or debugging without modifying your workspace.
tar_load() – bring into your environmenttar_load(target_name) loads the target into your global environment under its original name. This is useful when you want to work with an object interactively, for example when exploring results or building downstream analyses.
Typical workflow
A common development loop looks like:
- Modify code for a target or its dependencies
- Run
tar_make()to update the pipeline - Use
tar_read()to quickly inspect outputs - Use
tar_load()when deeper interactive work is needed
This workflow keeps the pipeline as the source of truth while still allowing flexible exploration during development.
Example pipeline
A simplified _targets.R file might look like this:
library(tidyverse)
library(NSWERSutils)
library(NSWERSthemes)
library(scales)
library(targets)
library(tarchetypes)
here::i_am("_targets.R")
tar_source(here::here("R"))
tar_option_set(
packages = c("tidyverse", "NSWERSutils"),
controller = crew::crew_controller_local(workers = 3)
)
load_rdb_tables <- make_table_hash_target(
c("wh_student"),
db_con_fun = dev_con
)
list(
load_rdb_tables,
# create target for proportion of gender in db
tar_target(prop_gender_lcl,
wh_student_tbl |>
distinct(nswers_id, gender) |>
filter(gender %in% c("M","F")) |>
mutate(n_total = as.numeric(n())) |>
group_by(gender) |>
summarize(n_gender=as.numeric(n()), .groups = "drop") |>
mutate(total_n = sum(n_gender)) |>
mutate(prop_gender = as.numeric(n_gender / total_n)) |>
collect()
),
# create target for plot
tar_target(plot_prop_gender,
prop_gender_lcl |>
ggplot(aes(x = gender, y = prop_gender, fill = gender)) +
geom_col(width = 0.6) +
geom_text(
aes(label = percent(prop_gender, accuracy = 0.1)),
vjust = -0.3
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Gender Distribution of Students",
x = NULL,
y = "Percent of Students"
) +
theme_nswers_default()
),
# save plot as png, pdf, svg
save_chart(plot_prop_gender, here::here("figures"))
)
Summary
{targets} replaces script-based workflows with a dependency-aware pipeline system:
- Work is organized as a DAG of targets
- Changes are detected via checksums
- Only affected components are recomputed
- Dependencies are transparent and visualizable
This approach improves reproducibility, efficiency, and collaboration as our work continues to scale.
Leave a Reply