name: tidy-r description: Modern tidyverse patterns, style guide, and migration guidance for R development. Use this skill when writing R code, reviewing tidyverse code, updating legacy R code, or enforcing consistent style. Covers native pipe usage, join_by() syntax, .by grouping, pick/across/reframe, filter_out/when_any/when_all, recode_values/replace_values/replace_when, tidyselect helpers, .data/.env pronouns, stringr, naming conventions, and readr. license: CC-BY-4.0 metadata: author: Ulrich Atz r_version: ">= 4.5.0" tidyverse_version: ">= 2.0.0" dplyr_version: ">= 1.2.0" allowed-tools: Read, Edit, Write, Grep, Glob, Bash, mcp__r-btw__*
Modern Tidyverse R Reference
Code from blog posts and StackOverflow often uses deprecated APIs, magrittr pipes, or base R patterns where a modern tidyverse function exists. This guide encodes the current recommended approach.
Reference files
Consult the appropriate reference file for detailed patterns and examples:
| Topic | Reference file | When to consult |
|---|---|---|
| Joins | joins.md | Merging data, *_join, join_by, matching rows, lookup tables |
| Grouping & columns | grouping.md | .by, group_by, across, pick, reframe, column operations |
| Recoding & replacing | recode-replace.md | recode_values, replace_values, replace_when, filter_out, when_any, when_all |
| Strings | stringr.md | String manipulation, regex, str_* functions, text processing |
| Tidy selection | tidyselect.md | Column selection helpers, where(), all_of(), any_of(), boolean ops, .data/.env pronouns |
| Style | tidyverse-style.md | Naming, formatting, spacing, error messages, cli::cli_abort |
| Migration | migration.md | Updating old code, base R conversion, deprecated functions |
For requests that span multiple topics (e.g., "rewrite this old code" touches migration + style), read multiple files.
Related skills
tidy-r is the default for in-memory tidyverse work. Reach for a companion skill when the task outgrows it -- the data-frame workflow and |> style carry over:
| Reach for | When |
|---|---|
| collapse-r | Performance matters on large or heavily-grouped in-memory data, or you need weighted statistics or panel/time-aware ops (lags, growth, between/within). f-prefixed verbs. |
| duckplyr-r | Local data too big for RAM, or reading Parquet/CSV/JSON, while keeping dplyr syntax (DuckDB engine, dplyr-identical results). |
| dbplyr-r | Data lives in a remote/connection database (Postgres, Snowflake, BigQuery, SQL Server); dplyr is translated to SQL and run server-side. |
| r-btw-cli | Look up R help/vignettes, run R CMD check / tests / document(), or search CRAN from the command line. |
Core principles
- Use modern tidyverse patterns -- Prioritize dplyr 1.2+ features, native pipe, and current APIs
- Write readable code first -- Optimize only when necessary
- Follow tidyverse style guide -- Consistent naming, spacing, and structure
Quick reference
Pipe and lambda
- Always
|>, never%>% - Use
_placeholder for non-first arguments:x |> f(1, y = _). The placeholder must be named and used exactly once. - Always
\(x), neverfunction(x)or~in map/keep/etc.
Code organization
Use newspaper style: high-level logic first, helpers below. Don't define functions inside other functions unless they are very brief.
Grouping
- Prefer
.byfor per-operation grouping; usegroup_by()when grouping must persist across multiple operations - Never add
ungroup()before or after.by-- it always returns ungrouped data - Consolidate multiple
mutate(.by = x)calls into one when they share the same.by; keep separate only when.bydiffers or a later column depends on an earlier one - Place
.byon its own line for readability
Joins
- Use
join_by(), neverc("a" = "b") - Use
relationship,unmatched,na_matchesfor quality control - Use
tidylog::prefix for join verification
Recoding and replacing (dplyr >= 1.2.0)
| Task | Function |
|---|---|
| Recode values (new column) | recode_values() |
| Replace values in place | replace_values() |
| Conditional update in place | replace_when() |
| Complex conditional (new column) | case_when() |
| Drop rows (NA-safe) | filter_out() |
| OR conditions | when_any() |
| AND conditions | when_all() |
NA handling
if_else()andcase_when()accept plainNA(since dplyr 1.1.0) -- no need forNA_character_,NA_real_, etc.- Load
tidynato makemean,sum,sd, etc. ignore NA by default. Avoid repetitivena.rm = TRUE.
Error handling
Use cli::cli_abort() with problem statement + bullets, never stop().
R idioms
TRUE/FALSE, neverT/Fmessage()for info, nevercat()map_*()oversapply()for type stabilityset.seed()with date-time, never 42qs2::qs_save()/qs2::qs_read(), neverqs
Example
library(tidyverse)
penguins <- penguins |>
filter_out(is.na(sex)) |>
mutate(size = case_when(
body_mass > 4500 ~ "large",
body_mass > 3500 ~ "medium",
.default = "small"
))
# Coordinates for spatial join below
island_coords <- tribble(
~island, ~latitude,
"Biscoe", -65.5,
"Dream", -64.7,
"Torgersen", -64.8
)
island_summary <- penguins |>
summarise(
mean_flipper = mean(flipper_len),
mean_mass = mean(body_mass),
n = n(),
.by = c(species, island)
) |>
left_join(
island_coords,
by = join_by(island),
unmatched = "error"
) |>
arrange(species, island)
Best practices
- Name variables as nouns, functions as verbs in snake_case
- Explain "why" in comments, not "what"
- Place
.byon its own line for readability - Use
.unmatched = "error"incase_when()andrecode_values()for defensive programming - Use
recode_values()overcase_match()(dplyr 1.2+ preferred API) - Use
replace_when()overcase_when()with.defaultwhen updating a column in place - Prefer
filter_out()over negatedfilter()for NA-safe row removal - Load tidyna early to eliminate
na.rm = TRUEclutter - Use tidylog:: for joins to verify row counts and match quality
- Use
qs2for serialization with.qs2extension