A comprehensive R package for streamlined survey research data processing, from SurveyMonkey API integration to cleaned datasets ready for analysis.
π§ MVP. More to come!
surveyUtils provides a complete workflow for survey research data processing. The package centers around an intelligent data codebook system that serves dual roles as both documentation and processing configuration, enabling automated data cleaning, scoring of standardized psychological instruments with subscales, and comprehensive quality control checks.
Built on production-level R practices, surveyUtils demonstrates data pipeline architecture with institutional knowledge management, making it particularly valuable for teams conducting repeated survey research studies.
- β‘ Faster project setup - From hours to minutes
- π― Consistent workflows - Standardized processes across team members, with easy reference parameters files
- π End-to-end automation - Complete pipeline from raw SurveyMonkey extracts to analysis-ready datasets
- π Intelligent codebook system - Self-documenting data processing with institutional memory through maximal codebook library
- β‘ Standardized instrument scoring - Automated scoring for scales, with subscale support
- β Quality control built-in - Attention checks, straightlining detection, duration filtering, and response validation
- π API integration - Direct SurveyMonkey downloads with caching
- ποΈ Modular architecture - Reusable functions following tidyverse conventions and DRY principles
- π Analysis-ready outputs - Wrangled data with comprehensive codebook documentation
- π Secure data handling - Token-based authentication (and future Box integration for sensitive research data)
surveyUtils/
βββ external/
β βββ surveymonkey_utils.R # SurveyMonkey API communication & data download
β βββ box_utils.R # Secure cloud storage integration [Coming Soon]
βββ data/
β βββ data_codebook_utils.R # Intelligent codebook generation & management
β βββ data_wrangling_utils.R # Data cleaning & processing
β βββ survey_scoring_utils.R # Standardized instrument scoring with subscales
βββ visualization/
β βββ plot_utils.R # Presentation-ready visualizations
β βββ table_utils.R # Presentation-ready tables [Coming Soon]
βββ core/
β βββ workflow_utils.R # High-level workflow orchestration [WIP]
βββ config/
β βββ parameters_template.R # Parameters file for one-stop configuration
The data codebook is the central organizing principle of surveyUtils, serving two critical functions:
- Documentation: Complete metadata for all variables including question text, response scales, and survey instrument information
- Processing Configuration: Machine-readable instructions for data cleaning, reverse coding, instrument scoring, and response format conversion
Captures survey metadata via SurveyMonkey API + integrates information from past studies to minimize manual completion.
| Column | Description |
|---|---|
col_num |
Column number for ordering |
question_text |
Full question text as presented to participants |
variable_name |
Original SurveyMonkey variable name (snake_case cleaned) |
short_variable_name |
Concise analysis-friendly variable name |
variable_name_R |
Variable name with _R suffix for reverse-coded items |
cat |
Variable category (demo, attention_check, survey_items, computed_scores) |
scale_abbrev |
Short scale name (phq, gad, pss, etc.) |
scale_full |
Complete standardized instrument name |
subscale |
Subscale designation for multi-factor instruments |
coding_direction |
1 for forward coding, -1 for reverse coding |
min / max |
Valid response range for validation and scoring |
response_choices |
Available response options (from SurveyMonkey API) |
response_coding |
Numeric coding for response choices |
response_format |
Format for conversion ("numeric", "text", etc.) |
correct_response |
Correct answer for attention check items |
required |
Whether question was required in survey |
question_format |
Question type (single_select, multi_select, matrix, etc.) |
question_id |
SurveyMonkey question ID |
page_number |
Survey page number |
family / subtype |
SurveyMonkey question family and subtype |
matrix_row_id / matrix_row_text |
Matrix question row identifiers |
- Maximal Codebook Library: Institutional knowledge system that automatically matches questions to known instruments using exact variable name matching and fuzzy text matching
- Intelligent Question Recognition: Two-stage matching process - exact matches first, then fuzzy matching for remaining questions
- Template Generation: Automated codebook creation from SurveyMonkey metadata
- Response Format Specification: Supports text-to-numeric conversion for standardized response scales
- Computed Score Tracking: Automatically adds total and subscale score variables to codebook after scoring
File: surveymonkey_utils.R
Handles secure communication with SurveyMonkey API for data retrieval and survey structure analysis.
Key Functions:
load_sm_token()- Secure OAuth token managementfetch_surveys_sm()- Retrieve survey list with intelligent cachingdownload_responses_sm()- Download survey responses with pagination handlingfetch_survey_structure_sm()- Extract detailed survey metadataflatten_survey_responses()- Convert nested API data to tabular formatcreate_double_headers()- Mirror SurveyMonkey export formatprocess_survey_responses()- Complete response processing pipeline
File: data_codebook_utils.R
Generates and maintains intelligent codebooks that combine API metadata with institutional knowledge.
Key Functions:
generate_basic_codebook_template()- Create codebook from SurveyMonkey metadatagenerate_enhanced_codebook_template()- AI-enhanced codebook with automatic scale matchingload_maximal_codebook()- Access institutional knowledge librarymatch_questions_to_codebook()- Fuzzy text matching for question identificationupdate_maximal_codebook()- Continuously improve knowledge basesetup_new_survey_project()- Complete project initialization workflow
File: data_wrangling_utils.R
Comprehensive data processing pipeline with quality control and validation.
Key Functions:
process_double_headers()- Handle SurveyMonkey's dual header formatapply_short_names()- Apply concise variable names from codebookscore_attention_checks()- Automated attention check validationfilter_by_date_range()- Remove test/QA responsescalculate_survey_duration()- Compute completion times with validationreverse_code_from_codebook()- Automated reverse codingcreate_multiselect_summary_cols()- Generate readable multi-select summariesprocess_survey_data()- Complete processing pipeline orchestration
File: survey_scoring_utils.R
Automated scoring of standardized psychological instruments with validation.
Key Functions:
var_score()- Core scoring function with prorating and validationload_data_dictionary()- Validation and loading of scoring configurationsget_survey_config()- Extract instrument-specific scoring parametersscore_survey()- Score individual instrumentsscore_surveys()- Batch scoring with error handlinggenerate_scoring_report()- Comprehensive scoring validation report
File: workflow_utils.R
High-level functions that coordinate complex multi-step research workflows.
Key Functions:
process_sm_survey_complete()- Complete SurveyMonkey download and setupsetup_new_research_project()- Create standardized project structurecomplete_survey_processing_pipeline()- End-to-end processing coordination
File: box_utils.R
Secure integration with Box for sensitive research data management.
Key Functions:
upload_survey_to_box()- Secure upload of processed datasets with versioning
File: plot_utils.R
Research-focused plotting functions with consistent styling for publications.
Key Functions:
make_hist()- Publication-ready histogramsmake_scatter()- Scatterplots with regression linesmake_group_scatter()- Group comparison visualizationsmake_group_time_plot()- Longitudinal data visualization
- Set up an app within your Survey Monkey account: https://developer.surveymonkey.com
- Enable Scopes:
- View Surveys
- View Collectors
- View Responses
- View Response Details
- View Webhooks
- Save Access Token to a csv file under a single column titled 'access_token'. Keep this file and token secure. Paste the path to the csv into creds_path in the parameters file.
# Load your survey-specific parameters *this should be the only line you have to edit*
source("config/parameters.R")
# Load utilities (sources all utils scripts)
source(params$utils_path)
# 1. Setup authentication
token <- load_sm_token(params$creds_path)
# 2. Pull responses data from SurveyMonkey via API
df_raw <- process_survey_responses(
token,
survey_name = params$survey_name,
save_csv = params$save_raw_csv,
output_path = params$raw_dir
)
# 3. Generate enhanced codebook with institutional knowledge
codebook <- generate_enhanced_codebook_template(
token,
survey_name = params$survey_name,
output_path = params$config_dir,
maximal_codebook_path = params$maximal_codebook_path,
fetch_new_metadata = params$fetch_new_metadata
)
# 4. Fill in any missing scale information in generated codebook CSV
# Edit: config/[survey_name]_generated_enhanced_codebook.csv
# 5. Complete data processing pipeline
df_processed <- process_survey_data(
df_raw,
codebook,
survey_name = params$survey_name,
score_surveys = params$score_surveys,
output_path = params$processed_dir,
start_date_col = params$start_date_col,
end_date_col = params$end_date_col,
min_date = params$min_date,
export_attention_failures = params$export_attention_failures,
remove_attention_fails = params$remove_attention_fails,
straightline_threshold = params$straightline_threshold,
remove_straightliners = params$remove_straightliners,
update_codebook_with_scored = params$update_codebook_with_scored
)The process_survey_data() function orchestrates these steps automatically:
- Column name standardization - Convert to snake_case, strip HTML
- Short name application - Apply concise variable names from codebook
- Double header processing - Combine question text and response choices
- Multi-select summaries - Create readable summary columns
- Date filtering - Remove test/QA responses before launch date
- Duration calculation - Compute and validate survey completion times
- Attention check scoring - Validate and optionally remove inattentive responders
- Response text mapping - Convert text responses to numeric codes
- Numeric conversion - Convert to numeric data type with validation
- Straightlining detection - Identify and optionally remove straightliners
- Reverse coding - Apply reverse coding based on codebook
- Survey scoring - Score all standardized instruments with subscales
- Codebook updating - Add computed score variables to codebook
This package demonstrates production-level R development practices including modular architecture, comprehensive documentation, intelligent caching, institutional knowledge management, and secure credential handling suitable for sensitive data.