Transfer REDCap data to a database in chunks and use in R without exceeding available memory. Whether you have a large or small project, enjoy features such as data labeling, converting coded values, and hearing a “quack” sound on success. Compatible with all databases but optimized for DuckDB.
Problem and Solution
Is your project outgrowing your computer? Have you seen this error when using the REDCap API to retrieve data?
Error: vector memory limit of 16.0 GB reached, see mem.maxVSize()
What does it mean? Well, R objects a stored in your random access memory (RAM). When your data gets too big, you hit your memory limit. redquack’s solution to this error is to store the data out of memory in a local database for easy retrieval in R.
The solution:
- Request all record IDs in REDCap
- Split the record IDs into chunks of 1,000 (default)
- Request one chunk of the project data at a time
- Transfer the chunk to a database
- Remove the chunk from memory
API requests are handled by httr2, which persistently retries to ensure your data is transferred successfully.
Installation
From CRAN:
# install.packages("pak")
pak::pak("redquack")From GitHub (development version):
pak::pak("dylanpieper/redquack")These packages are also used in the examples:
Setup API Token
Your API token allows R to interface with your REDCap instance, and it should be stored securely. I recommend using the keyring package to store your API token. For example:
keyring::key_set("redcap", "test")Use this token to run the examples: 9A81268476645C4E5F03428B8AC3AA7B
Basic Usage
Data from REDCap is transferred to a database connection in chunks of record IDs:
library(redquack)
library(dplyr)
conn <- use_duckdb()
result <- redcap_to_db(
conn,
url = "https://bbmc.ouhsc.edu/redcap/api/",
token = keyring::key_get("redcap", "test")
)redcap_to_db() returns a list of metadata with class redcap_transfer_result:
-
success: Logical if the transfer was completed with no failed processing -
error_chunks: Vector of chunk numbers that failed processing -
time_s: Numeric value for total seconds to process all data
Database Structure
The database created by redcap_to_db() contains up to four tables:
-
data: Contains the raw REDCap recordsdata <- tbl_redcap(conn) |> collect() -
metadata: Contains project metadata for labeling and coded value conversionmeta <- metadata(conn) -
redcap_log: Contains REDCap audit logs from the past week (default)redcap_log <- redcap_log(conn) -
transfer_log: Contains operation logs of the transfer processtransfer_log <- transfer_log(conn)
Data Types
Data is initially inserted into the database as VARCHAR/TEXT for consistent handling across chunks.
For DuckDB, data types are optimized after transfer to improve query performance:
- INTEGER: Columns with only whole numbers
- DOUBLE: Columns with decimal numbers
- DATE: Columns with valid dates
- TIMESTAMP: Columns with valid timestamps
- VARCHAR/TEXT: All other columns remain as strings
You can also query the database to inspect the data types:
inspect(conn)Data Manipulation
Retrieve and manipulate your REDCap data with familiar dplyr verbs and tbl-like syntax. The key difference is you reference the database table first using tbl_redcap().
Working with Instruments
Retrieve data organized by REDCap instruments as a list:
instruments <- tbl_redcap(conn) |>
collect_list() Assign instrument data frames to the global environment:
tbl_redcap(conn) |>
collect_list() |>
list_to_env()Filtering and Selecting Data
Use dplyr verbs to filter and select data before collecting:
# Filter to specific participants
nutmouse_data <- tbl_redcap(conn) |>
filter(name_last == "Nutmouse") |>
collect_labeled_list()
# Select specific columns across all instruments
key_vars <- tbl_redcap(conn) |>
select(record_id, email, sex, bmi) |>
collect_list()
# Simple column selection
analysis_data <- tbl_redcap(conn) |>
select(email, sex) |>
collect_labeled()Grouping and Aggregating
Perform complex data manipulation with grouping and filtering:
# Group by sex and filter to below-average BMI
below_avg_bmi <- tbl_redcap(conn) |>
select(record_id, sex, bmi) |>
group_by(sex) |>
filter(bmi < mean(bmi)) |>
arrange(bmi) |>
collect_list()Working with Labels
Apply column and coded value labels and control value conversion (i.e., choice mappings). collect_labeled() and collect_labeled_list() give you control over how these labels are applied:
Full labeling and coded value conversion (default):
data <- tbl_redcap(conn) |> collect_labeled()
data$sex
#> [1] "Female" "Male" "Male" "Female" "Male"
attr(data$sex, "label")
#> [1] "Gender"Keep raw coded values:
data <- tbl_redcap(conn) |> collect_labeled(convert = FALSE)
data$sex
#> <labelled<integer>[5]>: Gender
#> [1] 0 1 1 0 1
#>
#> Labels:
#> value label
#> 0 Female
#> 1 MaleColumn labels only:
data <- tbl_redcap(conn) |> collect_labeled(vals = FALSE)
data$sex
#> [1] 0 1 1 0 1
#> attr(data$sex, "label")
#> [1] "Gender"Value conversion only:
data <- tbl_redcap(conn) |> collect_labeled(cols = FALSE)
data$sex
#> [1] "Female" "Male" "Male" "Female" "Male"
attr(data$sex, "label")
#> NULLExporting Data
You can also write a Parquet file directly from DuckDB. A Parquet file will be much smaller than a DuckDB file and easy to share:
save_parquet(conn, "redcap.parquet")Cleanup
When you’re finished working with your data, clean up your connection:
close_duckdb(conn)Or, if you do not need to access the data again, remove/delete the file:
remove_duckdb(conn)