🧬 Bring AI to Your Biostatistics Workflow

This guide demonstrates how to use local Large Language Models (via Ollama) for statistical consulting, code generation, and analysis design—all running privately on your machine.

What you’ll learn:

✅ Statistical consultation with AI experts
✅ Automated R code generation for complex analyses
✅ Iterative analysis design workflows
✅ Semantic literature search with embeddings
✅ Structured outputs for reproducible reports

📋 Quick Reference: Jump to any section using the floating table of contents on the left.

1 Quick Start

Install Ollama from ollama.com, launch the app, then:

install.packages("ollamar")
library(ollamar)
test_connection()  # Should show "Ollama server is running"

# Pull your models (one-time setup)
pull("qwen3:14b")          # Best for R code generation
pull("gemma3:12b")         # Best for complex reasoning  
pull("gemma3:4b")          # Fast general-purpose
pull("llama3.2:3b")        # Quick explanations

Available Models:

Model	Size	Best Use	Speed
`qwen3:14b`	9.3GB	R code generation, complex analysis	⚡⚡
`gemma3:12b`	8.1GB	Complex reasoning, study design	⚡⚡
`qwen3:4b`	2.5GB	Quick code questions	⚡⚡⚡
`gemma3:4b`	3.3GB	Fast reasoning, general tasks	⚡⚡⚡
`phi4-mini:3.8b`	2.5GB	Efficient general-purpose	⚡⚡⚡
`granite4:3b`	2.1GB	IBM’s enterprise model	⚡⚡⚡
`llama3.2:3b`	2.0GB	Fast explanations, summaries	⚡⚡⚡
`nomic-embed-text-v2-moe`	957MB	Embeddings, semantic search	⚡⚡⚡
`mxbai-embed-large`	669MB	Embeddings, literature search	⚡⚡⚡

2 Core Workflow Patterns

💡 Pro Tip: Each pattern below can be combined. Start with consultation (Pattern 1), then generate code (Pattern 2), and iterate to refine (Pattern 3).

2.1 Pattern 1: Statistical Consultation

Get expert advice on methods and tests:

library(ollamar)

# Setup: System prompt for biostatistics expert
expert_prompt <- "You are a senior biostatistician. Give concise, evidence-based recommendations with key assumptions. Use markdown formatting but avoid using # headers - use **bold** for section titles instead."

# Helper function to sanitize LLM output - convert markdown headers to styled text
# This prevents headers from becoming document sections
# IMPORTANT: Preserves code blocks (```...```) without modification
sanitize_llm_output <- function(text) {
  # Split text by code blocks to preserve them
  # Pattern matches ```language\n...``` blocks
  parts <- strsplit(text, "(```[a-zA-Z]*\\n[\\s\\S]*?```)", perl = TRUE)[[1]]
  code_blocks <- gregexpr("```[a-zA-Z]*\\n[\\s\\S]*?```", text, perl = TRUE)
  matches <- regmatches(text, code_blocks)[[1]]
  
  # Function to convert headers in non-code text
  convert_headers <- function(txt) {
    # Only convert headers at start of line (not # comments in code)
    # Must have space after # and be on its own line
    txt <- gsub("(^|\\n)####\\s+([^\\n]+)", "\\1<p class='llm-h4'><strong>\\2</strong></p>", txt, perl = TRUE)
    txt <- gsub("(^|\\n)###\\s+([^\\n]+)", "\\1<p class='llm-h3'><strong>\\2</strong></p>", txt, perl = TRUE)
    txt <- gsub("(^|\\n)##\\s+([^\\n]+)", "\\1<p class='llm-h2'><strong>\\2</strong></p>", txt, perl = TRUE)
    txt <- gsub("(^|\\n)#\\s+([^\\n]+)", "\\1<p class='llm-h1'><strong>\\2</strong></p>", txt, perl = TRUE)
    return(txt)
  }
  
  # If no code blocks, just convert headers
  if (length(matches) == 0) {
    return(convert_headers(text))
  }
  
  # Rebuild text: convert headers in non-code parts, keep code blocks as-is
  result <- ""
  for (i in seq_along(parts)) {
    result <- paste0(result, convert_headers(parts[i]))
    if (i <= length(matches)) {
      result <- paste0(result, matches[i])
    }
  }
  return(result)
}

# Query function that formats output for R Markdown
ask_expert <- function(question, model = "gemma3:12b", show = TRUE) {
  msgs <- create_messages(
    create_message(expert_prompt, role = "system"),
    create_message(question)
  )
  response <- chat(model, msgs, 
       temperature = 0.3, num_predict = 800,
       output = "text")
  # Wrap in a styled div for better presentation
  if (show) {
    cat("\n<div class='llm-output'>\n\n")
    cat(sanitize_llm_output(response))
    cat("\n\n</div>\n")
  }
  invisible(response)
}

# Example: Which test?
ask_expert("
Study: RCT, 3 treatment arms (n=40 each)
Outcome: Pain reduction (0-10 scale, skewed right)
Question: Best statistical test and why?
")

Okay, here’s a breakdown of recommendations for analyzing this study, structured as a senior biostatistician would present it.

Understanding the Context & Assumptions

Before diving into tests, let’s clarify key assumptions. These significantly impact the appropriateness of different approaches.

Assumption 1: Independence: Observations (pain scores) are independent of each other within and between treatment groups. This is typical for RCTs, but potential issues (e.g., patients influencing each other) would require consideration.
Assumption 2: Randomization Success: The randomization process was successful in creating balanced groups. Significant baseline differences despite randomization would necessitate adjustment (see “Sensitivity Analyses” below).
Assumption 3: Skewness & Data Distribution: The outcome (pain reduction) is skewed right. This means a standard ANOVA might not be ideal without transformation. We need to assess the severity of the skewness. A log transformation is a common first attempt, but its effectiveness needs to be evaluated (see “Model Diagnostics”).
Assumption 4: Data are Continuous: Pain reduction on a 0-10 scale is treated as a continuous variable. If the research question is focused on categories (e.g., “no pain relief,” “mild relief,” “moderate relief,” “complete relief”), a different approach (ordinal logistic regression) would be more appropriate.

Recommended Statistical Test: ANOVA with Transformation (Initially)

Given the described scenario, my primary recommendation is a one-way Analysis of Variance (ANOVA). However, crucially, it should be preceded by a transformation to address the right skewness.

Rationale: ANOVA is appropriate for comparing means across multiple groups when the outcome is continuous. With n=40 per group, we have sufficient power for ANOVA to detect meaningful differences if they exist.
Transformation: A log transformation (log(pain reduction + 1)) is a common starting point for right-skewed data. The “+1” is to handle pain reduction scores of 0. Other transformations (e.g., square root, Box-Cox) could be explored if the log transformation doesn’t adequately normalize the data.
Post-Hoc Tests: If the ANOVA reveals a significant overall difference, Tukey’s Honestly Significant Difference (HSD) is the preferred post-hoc test for pairwise comparisons. It controls for the family-wise error rate.

Alternative Tests & Considerations

Kruskal-Wallis Test: If the data remain significantly skewed even after transformation, or if there are concerns about the assumptions of ANOVA (particularly normality), the non-parametric Kruskal-Wallis test is a viable alternative. It compares the medians of the groups. However, it is less powerful than ANOVA when ANOVA assumptions are met.
Generalized Linear Models (GLMs): If the transformation doesn’t fully resolve the skewness or if the data exhibit other non-normality issues, a GLM with a Gamma or Inverse Gaussian distribution might be considered. This requires more advanced modeling expertise.
ANCOVA: If there are known or suspected baseline differences between groups despite randomization, an Analysis of Covariance (ANCOVA) could be used to adjust for these differences. Requires careful consideration of the covariate’s relationship with the outcome.

Model Diagnostics & Validation

Normality Check: After transformation, always check the normality of the residuals. Use histograms, Q-Q plots, and formal tests (e.g., Shapiro-Wilk). If residuals are not approximately normal, further transformation or a GLM might be necessary.
Homogeneity of Variance: Assess

2.2 Pattern 2: R Code Generation

Generate analysis code with proper parameters:

# Code generation function with formatted output
generate_code <- function(prompt, model = "qwen3:14b", num_predict = 2000, show = TRUE) {
  response <- generate(
    model,
    prompt,
    temperature = 0.2,      # Low for deterministic code
    top_p = 0.3,
    num_predict = num_predict,
    output = "text"
  )
  if (show) {
    cat("\n<div class='llm-output'>\n\n")
    cat(sanitize_llm_output(response))
    cat("\n\n</div>\n")
  }
  invisible(response)
}

# Example: Mixed model code
generate_code("
Write R code for mixed-effects model:
- Data: longitudinal BP measurements (n=200 patients, 4 timepoints)
- Fixed: treatment (2 levels), time, treatment*time
- Random: patient intercept
- Use lme4, include model diagnostics
- Keep code under 50 lines
")

library(lme4)
model <- lmer(bp ~ treatment * time + (1 | patient), data = data)
summary(model)
plot(model)
qqnorm(resid(model)); qqline(resid(model))

Explanation:

library(lme4) loads the necessary package for fitting mixed-effects models.
lmer() fits the model with fixed effects: treatment, time, and their interaction, and a random intercept for each patient.
summary(model) provides model estimates, standard errors, and convergence information.
plot(model) generates default diagnostic plots (residuals vs. fitted values, QQ-plot, etc.).
qqnorm(resid(model)); qqline(resid(model)) adds a QQ-plot for checking normality of residuals.

This code is concise, under 50 lines, and includes essential diagnostics.

2.3 Pattern 3: Iterative Analysis Design

Build complex analyses step-by-step:

# Start conversation
conv <- create_messages(
  create_message("You are a biostatistics consultant guiding analysis design. Use markdown formatting but avoid using # headers.", role = "system"),
  create_message("Design primary analysis for Phase 3 hypertension trial comparing Drug A vs Placebo")
)

# Step 1: Get approach
step1 <- chat("gemma3:12b", conv, output = "text")
cat("\n<p class='llm-h2'><strong>📋 Analysis Approach</strong></p>\n\n<div class='llm-output'>\n\n")

📋 Analysis Approach

cat(sanitize_llm_output(step1))

Okay, let’s outline a primary analysis plan for a Phase 3 hypertension trial comparing Drug A versus placebo. I’m assuming this trial aims to demonstrate the efficacy of Drug A in lowering blood pressure. I’ll structure this with sections covering key aspects. Please read the IMPORTANT CAVEATS at the end; this is a template, and needs significant customization to your trial specifics.

Trial Overview (Assumed - needs your specifics)

Objective: To evaluate the efficacy of Drug A compared to placebo in reducing systolic blood pressure (SBP) in adult patients with hypertension.
Hypothesis: Patients receiving Drug A will exhibit a statistically significant reduction in SBP compared to patients receiving placebo.
Study Design: Randomized, double-blind, placebo-controlled, parallel-group trial.
Endpoints (Primary & Secondary - see below for more detail)
Sample Size: [Your calculated sample size - needs justification based on clinically meaningful difference, variability, and power.]
Patient Population: Adults (age range) with a confirmed diagnosis of hypertension (based on [specify criteria, e.g., average of two readings at two visits]). [Specify inclusion/exclusion criteria - critical for generalizability.]

Primary Analysis Endpoint and Statistical Method

Primary Endpoint: Change from baseline in systolic blood pressure (SBP) at [Specify time point, e.g., 12 weeks]. This is a continuous variable.
Statistical Method: Analysis of Covariance (ANCOVA).
- Model: SBP Change from Baseline = Intercept + Drug Group (Drug A vs. Placebo) + Baseline SBP + [Optional: Other Covariates - see covariates section] + Error.
- Rationale: ANCOVA allows for adjustment for baseline SBP (a crucial prognostic factor) and any other relevant covariates that may influence the outcome. It maintains power while accounting for potential imbalances.
- Definition of Treatment Effect: The difference in the mean SBP change from baseline between the Drug A and placebo groups.
- Hypothesis Testing: Two-sided test at a significance level of α = 0.05.
- Primary Outcome Measure: Difference in mean SBP change from baseline between groups, with 95% Confidence Interval (CI).

Covariates (Potential – select those relevant to your trial)

Consider including these covariates in the ANCOVA model to improve the precision of the treatment effect estimate:

Baseline SBP: Essential for adjusting for differences in initial blood pressure.
Baseline Diastolic Blood Pressure (DBP): May contribute to the variability in SBP response.
Age: SBP response can vary with age.
Sex: Potential differences in physiological response.
Race/Ethnicity: Potential differences in physiological response; plan for subgroup analyses (see below).
Body Mass Index (BMI): A known risk factor for hypertension.
Presence of Comorbidities: (e.g., diabetes, chronic kidney disease) - significant confounders that need addressing.
Medication Use (at baseline): Including specific medications, especially other antihypertensives if allowed. This is critical to account for existing treatment.
Lifestyle Factors (at baseline): e.g., smoking status, physical activity level (if collected).

Justification: Each covariate included must be clinically relevant and/or demonstrate an association with the primary endpoint in prior studies or exploratory analyses. Avoid “data dredging” – only include variables with a clear biological rationale.

Secondary Analyses

These complement the primary analysis and provide a broader understanding of Drug A’s effects.

Change from Baseline in Diastolic Blood Pressure (DBP) at [same time point]: ANCOVA model similar to SBP.
Proportion of Patients Achieving Blood Pressure Control (SBP < 140 mmHg AND DBP < 90 mmHg) at [same time point]: Chi-square test or Fisher’s exact test (if small sample sizes) to compare proportions between groups. Logistical Regression could also be used with similar covariates.
Time to Blood Pressure Control: Kaplan-Meier curves and log-rank test or Cox proportional hazards regression.
Change from Baseline in Other Cardiovascular Risk Markers: (e.g., lipid profile, heart rate) – Analyze using appropriate statistical methods depending on the distribution of the data (t-test, Mann-Whitney U test, ANCOVA).

Subgroup Analyses (Exploratory – Pre-specify!)

These should be planned a priori and interpreted cautiously. Do not data mine.

By Age Group: (e.g., <65 years, ≥65 years) – ANCOVA.
By Race/Ethnicity: – ANCOVA. Consider interactions with treatment.
By Presence/Absence of Diabetes: – ANCOVA.
By Baseline SBP Category: (e.g., mild, moderate, severe hypertension) - ANCOVA.

Handling Missing Data

Missing at Random (MAR): Multiple Imputation (MI) is preferred. Use a reasonable number of imputations (e.g., 5-10). Specify imputation model.
Missing Not at Random (MNAR): This is a complex issue. Sensitivity analyses are crucial. Consider alternative approaches like complete case analysis or inverse probability weighting, with caution.
Documentation: Clearly document the approach used and the assumptions made regarding missing data.

Interim Analyses (if planned)

If interim analyses are planned, these must be pre-specified, including stopping rules. Use appropriate statistical methods to adjust for multiple testing.

Software

Specify the statistical software to be used (e.g., SAS, R, Stata).

IMPORTANT CAVEATS & NEXT STEPS

This is a template. Every aspect needs customization. The specific covariates, time points, and statistical methods must be tailored to the specifics of your trial protocol and scientific rationale.
Protocol Adherence: The analysis plan must be detailed enough to allow for consistent implementation by multiple statisticians.
Data Distribution Assumptions: ANCOVA assumes normality of residuals. This needs to be checked and addressed if violated (e.g., transformations).
Interaction Effects: Consider testing for interaction effects between treatment and covariates.
Sensitivity Analyses: Perform sensitivity analyses to assess the robustness of the primary findings to different assumptions (e.g., different imputation methods, different covariate selection).
Statistical Review: This plan must be reviewed by a qualified statistician before trial initiation.
Trial Monitoring Committee (TMC): The TMC should oversee the analysis plan and data monitoring.

To help me refine this further, please tell me:

What is the expected clinically significant difference in SBP you are trying to detect?
What is the estimated variability (SD) of SBP at baseline?
What is the anticipated dropout rate?
Are there any specific comorbidities that are of particular interest?

cat("\n\n</div>\n")

# Step 2: Add details, continue conversation
conv <- append_message(step1, role = "assistant", x = conv)
conv <- append_message("Primary endpoint: Change in SBP at 12 weeks. Baseline SBP ~145mmHg (SD=15). n=300 total.", 
                       role = "user", x = conv)

step2 <- chat("gemma3:12b", conv, output = "text")
cat("\n<p class='llm-h2'><strong>📊 Detailed Plan</strong></p>\n\n<div class='llm-output'>\n\n")

📊 Detailed Plan

cat(sanitize_llm_output(step2))

Okay, excellent. Knowing those specifics allows us to refine the analysis plan significantly. Let’s incorporate these details and update the document accordingly. I’ll highlight the most important changes and additions below.

Trial Overview (Assumed - needs your specifics)

Objective: To evaluate the efficacy of Drug A compared to placebo in reducing systolic blood pressure (SBP) in adult patients with hypertension.
Hypothesis: Patients receiving Drug A will exhibit a statistically significant reduction in SBP compared to patients receiving placebo.
Study Design: Randomized, double-blind, placebo-controlled, parallel-group trial.
Endpoints (Primary & Secondary - see below for more detail)
Sample Size: 300 total (150 per arm). Justification: Based on a clinically meaningful difference of [Specify Value] mmHg, an estimated SD of 15 mmHg, a power of 80%, and a significance level of 0.05, 150 patients per arm are required. This assumes an estimated dropout rate of [Specify Drop Out Rate]%.
Patient Population: Adults (age range) with a confirmed diagnosis of hypertension (based on [specify criteria, e.g., average of two readings at two visits]). [Specify inclusion/exclusion criteria - critical for generalizability.]

Primary Analysis Endpoint and Statistical Method

Primary Endpoint: Change from baseline in systolic blood pressure (SBP) at 12 weeks. This is a continuous variable.
Statistical Method: Analysis of Covariance (ANCOVA).
- Model: SBP Change from Baseline = Intercept + Drug Group (Drug A vs. Placebo) + Baseline SBP + [Optional: Other Covariates - see covariates section] + Error.
- Rationale: ANCOVA allows for adjustment for baseline SBP (a crucial prognostic factor) and any other relevant covariates that may influence the outcome. It maintains power while accounting for potential imbalances.
- Definition of Treatment Effect: The difference in the mean SBP change from baseline between the Drug A and placebo groups.
- Hypothesis Testing: Two-sided test at a significance level of α = 0.05.
- Primary Outcome Measure: Difference in mean SBP change from baseline between groups, with 95% Confidence Interval (CI). The expected difference in SBP change is [Specify Clinically Meaningful Difference].

Covariates (Potential – select those relevant to your trial)

Consider including these covariates in the ANCOVA model to improve the precision of the treatment effect estimate:

Baseline SBP: Essential for adjusting for differences in initial blood pressure.
Baseline Diastolic Blood Pressure (DBP): May contribute to the variability in SBP response.
Age: SBP response can vary with age.
Sex: Potential differences in physiological response.
Race/Ethnicity: Potential differences in physiological response; plan for subgroup analyses (see below).
Body Mass Index (BMI): A known risk factor for hypertension.
Presence of Comorbidities: (e.g., diabetes, chronic kidney disease) - significant confounders that need addressing.
Medication Use (at baseline): Including specific medications, especially other antihypertensives if allowed. This is critical to account for existing treatment.
Lifestyle Factors (at baseline): e.g., smoking status, physical activity level (if collected).

Secondary Analyses

These complement the primary analysis and provide a broader understanding of Drug A’s effects.

Change from Baseline in Diastolic Blood Pressure (DBP) at [same time point]: ANCOVA model similar to SBP.
Proportion of Patients Achieving Blood Pressure Control (SBP < 140 mmHg AND DBP < 90 mmHg) at [same time point]: Chi-square test or Fisher’s exact test (if small sample sizes) to compare proportions between groups. Logistical Regression could also be used with similar covariates.
Time to Blood Pressure Control: Kaplan-Meier curves and log-rank test or Cox proportional hazards regression.
Change from Baseline in Other Cardiovascular Risk Markers: (e.g., lipid profile, heart rate) – Analyze using appropriate statistical methods depending on the distribution of the data (t-test, Mann-Whitney U test, ANCOVA).

Subgroup Analyses (Exploratory – Pre-specify!)

These should be planned a priori and interpreted cautiously. Do not data mine.

By Age Group: (e.g., <65 years, ≥65 years) – ANCOVA.
By Race/Ethnicity: – ANCOVA. Consider interactions with treatment.
By Presence/Absence of Diabetes: – ANCOVA.
By Baseline SBP Category: (e.g., mild, moderate, severe hypertension) - ANCOVA.

Handling Missing Data

Missing at Random (MAR): Multiple Imputation (MI) is preferred. Use a reasonable number of imputations (e.g., 5-10). Specify imputation model.
Missing Not at Random (MNAR): This is a complex issue. Sensitivity analyses are crucial. Consider alternative approaches like complete case analysis or inverse probability weighting, with caution.
Documentation: Clearly document the approach used and the assumptions made regarding missing data.

Interim Analyses (if planned)

If interim analyses are planned, these must be pre-specified, including stopping rules. Use appropriate statistical methods to adjust for multiple testing.

Software

Specify the statistical software to be used (e.g., SAS, R, Stata).

IMPORTANT CAVEATS & NEXT STEPS

This is a template. Every aspect needs customization. The specific covariates, time points, and statistical methods must be tailored to the specifics of your trial protocol and scientific rationale.
Protocol Adherence: The analysis plan must be detailed enough to allow for consistent implementation by multiple statisticians.
Data Distribution Assumptions: ANCOVA assumes normality of residuals. This needs to be checked and addressed if violated (e.g., transformations).
Interaction Effects: Consider testing for interaction effects between treatment and covariates.
Sensitivity Analyses: Perform sensitivity analyses to assess the robustness of the primary findings to different assumptions (e.g., different imputation methods, different covariate selection).
Statistical Review: This plan must be reviewed by a qualified statistician before trial initiation.
Trial Monitoring Committee (TMC): The TMC should oversee the analysis plan and data monitoring.

To help me further refine this, could you clarify:

What is the clinically meaningful difference in SBP change you’re hoping to observe? (You mentioned 145 mmHg baseline, but what reduction would be considered a successful outcome?)
What is the expected dropout rate for this trial?
Are there any pre-existing analyses or pilot data that suggest particular covariates should be prioritized?
Are there specific subgroups you are especially interested in examining?

cat("\n\n</div>\n")

# Step 3: Get code (switch to code model)
conv <- append_message(step2, role = "assistant", x = conv)
conv <- append_message("Provide complete R code for this analysis", role = "user", x = conv)

code <- chat("qwen3:14b", conv, output = "text")
cat("\n<p class='llm-h2'><strong>💻 Generated Code</strong></p>\n\n<div class='llm-output'>\n\n")

💻 Generated Code

cat(sanitize_llm_output(code))

Below is a complete R script for the primary analysis of your Phase 3 hypertension trial, based on your specifications:

R Code for Primary Analysis (ANCOVA)

# Load required libraries
library(tidyverse)
library(mice)  # For multiple imputation (if needed)

# ---------------------------
# 1. Simulate Data (Replace with real data)
# ---------------------------
set.seed(123)  # For reproducibility

# Parameters
n <- 300  # Total sample size
treatment_effect <- 5  # Clinically meaningful difference in SBP (example value)
baseline_sbp_mean <- 145
baseline_sbp_sd <- 15
follow_up_sbp_sd <- 10  # Assumed SD of change in SBP

# Generate synthetic data
data <- tibble(
  subject_id = 1:n,
  treatment = factor(rep(c("Drug A", "Placebo"), each = n / 2)),
  baseline_sbp = rnorm(n, mean = baseline_sbp_mean, sd = baseline_sbp_sd),
  follow_up_sbp = ifelse(
    treatment == "Drug A",
    rnorm(n / 2, mean = baseline_sbp + treatment_effect, sd = follow_up_sbp_sd),
    rnorm(n / 2, mean = baseline_sbp, sd = follow_up_sbp_sd)
  )
)

# View first few rows
head(data)

2. ANCOVA Model (Primary Analysis)

# Fit ANCOVA model
ancova_model <- lm(follow_up_sbp ~ treatment + baseline_sbp, data = data)

# Summary of model
summary(ancova_model)

# Extract treatment effect (coefficient for "treatmentDrug A")
treatment_coeff <- coef(ancova_model)["treatmentDrug A"]
p_value <- summary(ancova_model)$coefficients["treatmentDrug A", "Pr(>|t|)"]
ci_lower <- confint(ancova_model)["treatmentDrug A", 1]
ci_upper <- confint(ancova_model)["treatmentDrug A", 2]

# Output results
cat("Primary Analysis Results:\n")
cat("Treatment Effect (Drug A vs Placebo):", treatment_coeff, "mmHg\n")
cat("95% Confidence Interval:", ci_lower, "-", ci_upper, "\n")
cat("p-value:", p_value, "\n")

3. Assumption Checks

# Check normality of residuals
plot(ancova_model, which = 2)  # QQ-plot
shapiro_test <- shapiro.test(residuals(ancova_model))
cat("Shapiro-Wilk Test for Normality:\n")
print(shapiro_test)

# Check homoscedasticity (constant variance)
plot(ancova_model, which = 1)  # Residuals vs fitted values

4. Handling Missing Data (Optional)

# If there are missing values in the dataset:
# data_missing <- data %>%
#   mutate(across(c(follow_up_sbp, baseline_sbp), ~ ifelse(runif(1) < 0.1, NA, .)))

# Multiple imputation (if needed)
# imputed_data <- mice(data_missing, m = 5, maxit = 5, method = "pmm", seed = 123)
# completed_data <- complete(imputed_data, 1)

# Fit model on imputed data
# ancova_model_imputed <- lm(follow_up_sbp ~ treatment + baseline_sbp, data = completed_data)
# summary(ancova_model_imputed)

5. Subgroup Analysis (Example: Age Groups)

# Add age as a covariate (simulated)
data <- data %>%
  mutate(age = rnorm(n, mean = 50, sd = 10))

# Subgroup analysis: Age < 65 vs Age >= 65
subgroup_data <- data %>%
  mutate(age_group = factor(ifelse(age < 65, "Younger", "Older")))

# Fit ANCOVA with interaction
ancova_subgroup <- lm(follow_up_sbp ~ treatment * age_group + baseline_sbp, data = subgroup_data)
summary(ancova_subgroup)

6. Save Results

# Save results to a CSV file
results_df <- tibble(
  Treatment_Effect = treatment_coeff,
  CI_Lower = ci_lower,
  CI_Upper = ci_upper,
  P_Value = p_value
)

write_csv(results_df, "primary_analysis_results.csv")

Notes

Replace Simulated Data: Replace data with your actual dataset (ensure follow_up_sbp and baseline_sbp are numeric, and treatment is a factor).
Adjust Parameters: Modify treatment_effect, baseline_sbp_mean, and other parameters to match your trial design.
Missing Data: If your dataset has missing values, use the mice package for imputation (as shown in the optional section).
Subgroup Analysis: Customize the age_group logic or add other subgroups (e.g., diabetes status).

Example Output

Primary Analysis Results:
Treatment Effect (Drug A vs Placebo): 5.12 mmHg
95% Confidence Interval: 3.87 - 6.37
p-value: 0.0001

This script is flexible and can be adapted to your trial’s specific requirements. Let me know if you need further refinements!

cat("\n\n</div>\n")

3 Practical Biostatistics Examples

🎯 Real-World Scenarios: These examples mirror common biostatistics tasks in clinical trials and observational studies. Each generates complete, runnable R code.

3.1 Sample Size & Power

power_query <- "
Trial design:
- Outcome: HbA1c reduction (continuous, SD=1.2%)
- MCID: 0.5% difference between groups
- Power: 90%, α=0.05 (two-sided)
- Expected dropout: 15%

Calculate sample size per group. Provide R code using pwr package.
"

generate_code(power_query)

3.2 Survival Analysis Pipeline

generate_code("
Create survival analysis for cancer trial (n=250):
1. Generate realistic data: time (months), status (0/1), treatment (A/B), age, stage
2. Kaplan-Meier curves by treatment
3. Log-rank test
4. Cox model with treatment + age + stage
5. Check PH assumption
6. Forest plot of HRs
Make it publication-ready. Use survival and survminer packages.
")

3.3 Missing Data Strategy

ask_expert("
Longitudinal trial, 4 visits, ~20% dropout by visit 4 (likely MNAR).
Primary: Change from baseline at visit 4.
Options: MMRM, MI, LOCF, complete case?
Recommend primary + sensitivity analyses.
")

Okay, here’s a concise, evidence-based recommendation for analyzing your longitudinal trial data, considering the dropout rate and likely MNAR nature of the missingness. I’m structuring this as a senior biostatistician would, focusing on justification and assumptions.

Data Analysis Strategy: Longitudinal Trial

Context & Concerns

You have a longitudinal trial with 4 visits and a 20% dropout rate by visit 4. The likely MNAR (Missing Not At Random) nature of the dropout is a significant concern. Ignoring this can lead to biased estimates and incorrect conclusions. Complete Case analysis is almost certainly inappropriate. LOCF is generally discouraged due to its strong assumptions and potential for bias. MI is a reasonable starting point, but requires careful consideration of the missing data mechanism. MMRM offers more flexibility and can potentially account for time-dependent effects.

Primary Analysis Recommendation: Mixed-Effects Regression Model (MMRM)

Method: MMRM with a fixed effect for treatment and a random intercept and slope for subject. Include visit as a continuous predictor.
Justification: MMRM allows for modeling the longitudinal trajectory while accounting for individual variability. The inclusion of visit as a continuous predictor allows for flexible modeling of the time-dependent effects. It’s more robust to MNAR than MI or LOCF, provided the time-dependent effects are appropriately modeled.
Key Assumptions:
- Linearity: The relationship between visit and the outcome is approximately linear. (Check with scatterplots and consider transformations if necessary.)
- Independence: Residuals are independent. (Check diagnostic plots for autocorrelation.)
- Homogeneity of Variance: Residual variance is constant across treatment groups and over time. (Check diagnostic plots.)
- Missingness Mechanism (Important): While MMRM is more robust to MNAR than other methods, it still relies on the assumption that the time-dependent effects (e.g., the slope for visit) are correctly specified. If the reason for dropout is directly related to the trajectory of the outcome (e.g., sicker patients drop out), the model may still be biased.
Software: R (lme4 package), SAS (PROC MIXED), or similar.

Sensitivity Analyses (Crucial)

These are essential to assess the robustness of your primary findings.

Multiple Imputation (MI):
- Method: Perform MI using a predictive mean matching or chained equations approach. Include baseline covariates and potentially time-dependent variables in the imputation model.
- Justification: MI provides a more complete dataset for analysis, allowing for a different perspective on the primary analysis.
- Key Assumptions: The imputation model accurately reflects the missing data mechanism. This is difficult to verify, so careful consideration of potential predictors of missingness is vital.
LOCF (as a negative control):
- Method: Analyze using LOCF.
- Justification: To assess the potential impact of the missing data on the results. Large discrepancies between MMRM and LOCF results would suggest substantial bias.
- Key Assumption: LOCF is based on the strong assumption that missing values are equal to the last observed value. This is rarely true in longitudinal studies.
Worst-Case Scenario Analysis:
- Method: Assume that the missing data are always unfavorable to the treatment group. Re-analyze the data under this assumption.
- Justification: To understand the potential impact of the missing data on the results if the missingness is systematically related to the outcome.
**Subgroup Analysis (

3.4 Propensity Score Analysis

generate_code("
Observational study comparing treatment effect:
- Outcome: 30-day mortality (binary)
- Treatment: Surgery vs Medical (n=800 each)
- Confounders: age, comorbidities, severity score

Complete PS analysis:
1. Estimate propensity scores (logistic)
2. Check balance before/after matching
3. 1:1 matching with caliper
4. Estimate treatment effect
5. Sensitivity to hidden confounding
Use MatchIt and sensemakr packages. Commented code.
")

# Load necessary packages
library(MatchIt)
library(sensemakr)

# Step 1: Estimate Propensity Scores (Logistic Regression)
# Assume 'data' is a data frame with:
# - 'treatment': binary (0 = Medical, 1 = Surgery)
# - 'mortality_30d': binary outcome (0 = Survived, 1 = Died)
# - 'age', 'comorbidities', 'severity_score': confounders

# Fit logistic regression model for propensity scores
ps_model <- glm(treatment ~ age + comorbidities + severity_score, 
                data = data, family = binomial())

# Extract propensity scores
data$propensity_score <- predict(ps_model, type = "response")

# Step 2: Check Balance Before Matching
# Calculate standardized mean differences and variance ratios
# For continuous variables (age, severity_score) and binary (comorbidities)
# Use the 'summary' function from MatchIt

# Before matching balance
before_balance <- summary(matchit(treatment ~ age + comorbidities + severity_score, 
                                  data = data, method = "nearest", ratio = 1, 
                                  caliper = 0.2), 
                        unadjusted = TRUE)

# Step 3: 1:1 Matching with Caliper
# Perform matching with nearest neighbor, 1:1 ratio, and caliper of 0.2 SD
matched_data <- matchit(treatment ~ age + comorbidities + severity_score, 
                        data = data, method = "nearest", ratio = 1, caliper = 0.2)

# Extract matched data
matched_df <- match_data(matched_data)

# Step 4: Check Balance After Matching
# Check balance on matched data
after_balance <- summary(matched_data)

# Step 5: Estimate Treatment Effect on Matched Data
# Fit logistic regression model on matched data
# Outcome: mortality_30d; Treatment: treatment (0 = Medical, 1 = Surgery)
model_matched <- glm(mortality_30d ~ treatment, data = matched_df, family = binomial())

# Extract results
summary(model_matched)

# Odds Ratio (OR) and 95% CI
or <- exp(coef(model_matched)[2])
or_ci <- exp(confint(model_matched)[2, ])

# Step 6: Sensitivity Analysis to Hidden Confounding
# Use sensemakr to assess robustness to unmeasured confounding
# Assume 'treatment' and 'mortality_30d' are in the original data
# Include confounders in the model

# Fit model with treatment and confounders
model_sensitivity <- glm(mortality_30d ~ treatment + age + comorbidities + severity_score, 
                         data = data, family = binomial())

# Perform sensitivity analysis (R2 = 0.1 means unmeasured confounder explains 10% of outcome variance)
sensitivity_results <- sensitivity_analysis(model_sensitivity, 
                                            treatment = "treatment", 
                                            R2 = 0.1, 
                                            effect = "ATE")

# View sensitivity results
summary(sensitivity_results)

# Optional: Plot sensitivity analysis
plot(sensitivity_results)

Explanation of Key Steps:

Propensity Score Estimation:
- Logistic regression models the probability of receiving surgery (treatment = 1) based on confounders (age, comorbidities, severity_score).
Balance Checking:
- Before matching: Standardized mean differences (SMD) and variance ratios are calculated to assess initial imbalance.
- After matching: SMDs should be close to 0 (ideally < 0.1) for good balance.
Matching:
- method = "nearest" ensures 1:1 matching with caliper (0.2 SD of propensity score) to avoid mismatches.
Treatment Effect Estimation:
- Logistic regression on matched data estimates the adjusted odds ratio (OR) for 30-day mortality comparing surgery vs. medical treatment.
Sensitivity Analysis:
- sensemakr quantifies how much unmeasured confounding (e.g., an unobserved variable explaining 10% of outcome variance) would be needed to change the conclusion.
- The R2 parameter controls the strength of the hidden confounder.

Interpretation:

Balance Check: If SMDs are < 0.1 after matching, confounding is adequately controlled.
Treatment Effect: The OR from the matched model reflects the causal effect of surgery vs. medical treatment on mortality.
Sensitivity Analysis: If results remain significant under high R2 values (e.g., R2 = 0.2), the findings are robust to hidden confounding.

Notes:

Ensure data is properly formatted (e.g., treatment as binary, mortality_30d as binary).
Adjust R2 in sensemakr based on plausible unmeasured confounding.
For more advanced methods (e.g., inverse probability weighting), use method = "ipw" in matchit.

4 Parallel Processing for Efficiency

⚠️ Performance Note: Parallel requests can speed up batch queries significantly, but monitor your system resources. Ollama loads models into GPU/RAM, so running too many concurrent requests may cause memory issues.

Process multiple queries simultaneously:

library(httr2)

# Multiple statistical questions
questions <- c(
  "When to use Wilcoxon vs t-test?",
  "Interpret odds ratio of 2.5 in logistic regression",
  "Difference between ITT and PP analysis",
  "When to use Bonferroni vs FDR correction"
)

# Create parallel requests (fast model)
reqs <- lapply(questions, function(q) {
  generate("gemma3:4b", q, 
           temperature = 0.3, num_predict = 300,
           output = "req")
})

# Execute in parallel
resps <- req_perform_parallel(reqs, on_error = "continue")

# Process results
answers <- lapply(resps, resp_process, "text")

# Display with styled boxes
for(i in seq_along(questions)) {
  cat(sprintf("\n<p class='llm-h3'><strong>Q%d: %s</strong></p>\n\n<div class='llm-output'>\n\n%s\n\n</div>\n", 
              i, questions[i], sanitize_llm_output(answers[[i]])))
}

Q1: When to use Wilcoxon vs t-test?

Okay, let’s break down when to use the Wilcoxon signed-rank test versus the independent samples t-test. They’re both used to compare groups, but they’re appropriate for different situations. Here’s a detailed comparison:

1. Independent Samples t-test:

What it is: This test compares the means of two independent groups. It assumes the data is approximately normally distributed.
When to use it:
- Data is normally distributed: This is the most important factor. If your data looks roughly bell-shaped, the t-test is generally a good choice.
- Comparing two independent groups: The groups being compared have no relationship to each other (e.g., treatment group vs. control group, men vs. women).
- Continuous Data: The variables you’re comparing are continuous (e.g., height, weight, test scores).
Assumptions:
- Normality: Data in each group should be approximately normally distributed. You can check this with histograms, Q-Q plots, or statistical tests for normality (e.g., Shapiro-Wilk test).
- Equal Variances (Homogeneity of Variance): Ideally, the variances of the two groups should be roughly equal. You can test this with Levene’s test. If variances are unequal, you can use a version of the

Q2: Interpret odds ratio of 2.5 in logistic regression

Okay, let’s break down what an odds ratio of 2.5 in logistic regression means. It’s a crucial piece of information for understanding the impact of a predictor variable on the probability of an event occurring.

1. Understanding Odds

Odds: In simple terms, odds represent the ratio of the probability of an event happening to the probability of it not happening.
- Odds = (Probability of Event) / (Probability of Not Event)

2. Odds Ratio (OR) Explained

Definition: The odds ratio is a standardized measure of association between an exposure (predictor variable) and an outcome (the event you’re trying to predict). It tells you how much more likely an event is to occur for individuals exposed to a particular level of the predictor, compared to those not exposed.
Calculation: The odds ratio is calculated as:

OR = (Odds of Event with Exposure) / (Odds of Event without Exposure)

3. Interpreting an Odds Ratio of 2.5

In this case, an odds ratio of 2.5 means: For every one-unit increase in the predictor variable (assuming it’s a continuous variable), the odds of the event occurring increase by a factor of 2.5.
Let’s illustrate with an example:

Let’s say you’re investigating the relationship between smoking

Q3: Difference between ITT and PP analysis

andOkay, let’s break down the differences between ITT (Initial Training Time) and PP (Production Period) analysis, particularly as they relate to project management, especially in construction and infrastructure projects. They’re both used to estimate the total time required for a project, but they focus on different aspects.

1. ITT (Initial Training Time) Analysis

Focus: This analysis concentrates on the time required to learn and master the skills needed to perform the work. It’s primarily used in situations where a new process, technology, or method is being introduced.
What it Measures: ITT looks at the time it takes for the team to become proficient in the new way of doing things. This includes:
- Training Courses: Time spent in formal training sessions.
- Hands-on Practice: Time dedicated to practicing the new skills.
- Knowledge Transfer: Time spent learning from experienced personnel.
- Documentation Review: Time spent understanding procedures and guidelines.
Typical Use Cases:
- New Software Implementation: Estimating the time to train the team on a new ERP system.
- Process Improvement: Calculating the time needed to train employees on a revised workflow.
- New Equipment Introduction: Determining the time required for operators to become skilled with a new machine.
Output: An estimate of the *learning

Q4: When to use Bonferroni vs FDR correction

curveOkay, let’s break down the differences between the Bonferroni and False Discovery Rate (FDR) correction methods, and when you should use each. They’re both designed to control the rate of false positives (Type I errors) when performing multiple hypothesis tests, but they approach the problem differently and have different implications.

1. Bonferroni Correction

How it Works: The Bonferroni method is the most conservative approach. It controls the Family-Wise Error Rate (FWER). This means it guarantees that the probability of making any false positive across all your tests is less than or equal to a chosen significance level (alpha, usually 0.05).
Calculation: You divide your desired alpha level (e.g., 0.05) by the number of tests you’re conducting.
- Adjusted p-value = p-value / m
- Where ‘m’ is the total number of hypotheses tested.
Strengths:
- Guaranteed Control of FWER: It’s the most rigorous method for ensuring you don’t make any false positives.
- Simple to Calculate: The calculation is straightforward.
Weaknesses:
- Very Conservative: This is its biggest drawback. It drastically reduces the power of your tests. You’re

5 Literature Search with Embeddings

Semantic search through your literature database:

# Your abstracts database
abstracts <- c(
  "Cox proportional hazards analysis of survival in stage III colon cancer patients treated with adjuvant chemotherapy.",
  "Propensity score matching to evaluate effectiveness of drug-eluting stents vs bare-metal stents in real-world setting.",
  "Mixed-effects model for repeated measures (MMRM) in longitudinal clinical trials with missing data.",
  "Bayesian adaptive design for phase II oncology trials with continuous monitoring of efficacy and toxicity.",
  "Meta-analysis of randomized trials comparing intensive vs standard glycemic control in type 2 diabetes."
)

# Generate embeddings
embs <- lapply(abstracts, function(x) embed("mxbai-embed-large", x))

# Search function
search_literature <- function(query, abstracts, embeddings, top_n = 3) {
  query_emb <- embed("mxbai-embed-large", query)
  
  # Cosine similarity
  sims <- sapply(embeddings, function(e) sum(e * query_emb))
  
  # Return top results
  idx <- order(sims, decreasing = TRUE)[1:top_n]
  data.frame(
    rank = 1:top_n,
    similarity = round(sims[idx], 3),
    abstract = abstracts[idx]
  )
}

# Example search
results <- search_literature("survival analysis Cox regression", abstracts, embs)
print(results)

  rank similarity
1    1      0.742
2    2      0.635
3    3      0.616
                                                                                                              abstract
1 Cox proportional hazards analysis of survival in stage III colon cancer patients treated with adjuvant chemotherapy.
2           Bayesian adaptive design for phase II oncology trials with continuous monitoring of efficacy and toxicity.
3                  Mixed-effects model for repeated measures (MMRM) in longitudinal clinical trials with missing data.

6 Structured Output for Reports

Get responses in structured format:

# Define schema (JSON schema as R list)
analysis_schema <- list(
  type = "object",
  properties = list(
    test_name = list(type = "string"),
    assumptions = list(type = "array", items = list(type = "string")),
    r_function = list(type = "string"),
    interpretation = list(type = "string")
  ),
  required = list("test_name", "assumptions", "r_function")
)

# Get structured response (use output = "structured")
result <- chat(
  "qwen3:14b",
  create_message("Recommend a statistical test for comparing 3 independent groups with an ordinal outcome (5-point Likert scale). Return the test name, key assumptions, R function to use, and interpretation guidance."),
  format = analysis_schema,
  output = "structured"
)

# Display structured result nicely
cat("\n<div class='llm-output'>\n\n")

cat("**Test:**", result$test_name, "\n\n")

Test:

cat("**R Function:**", paste0("`", result$r_function, "`"), "\n\n")

R Function: ``

cat("**Assumptions:**\n\n")

Assumptions:

for(a in result$assumptions) cat("- ", a, "\n")
cat("\n**Interpretation:**\n\n", result$interpretation, "\n\n")

Interpretation:

cat("</div>\n")

7 Real Data Example: Complete Analysis

🔬 Hands-On Demo: This section uses R’s built-in mtcars dataset to demonstrate a complete LLM-assisted analysis workflow from exploratory analysis to interpretation.

# Load and summarize real data
data(mtcars)

# Create analysis context for LLM
data_summary <- sprintf("
Dataset: mtcars (Motor Trend Car Road Tests)
Observations: %d cars
Variables: mpg (fuel efficiency), cyl (cylinders: 4,6,8), hp (horsepower), wt (weight), am (transmission: 0=auto, 1=manual)

Summary statistics:
- MPG: mean=%.1f, sd=%.1f, range=[%.1f, %.1f]
- Cylinders: %d four-cyl, %d six-cyl, %d eight-cyl
- Transmission: %d automatic, %d manual

Research question: Does transmission type (automatic vs manual) affect fuel efficiency (mpg), controlling for weight and horsepower?
", nrow(mtcars), mean(mtcars$mpg), sd(mtcars$mpg), min(mtcars$mpg), max(mtcars$mpg),
sum(mtcars$cyl==4), sum(mtcars$cyl==6), sum(mtcars$cyl==8),
sum(mtcars$am==0), sum(mtcars$am==1))

# Ask LLM for analysis recommendation
cat("\n<p class='llm-h2'><strong>📊 Analysis Plan</strong></p>\n")

📊 Analysis Plan

ask_expert(paste0(data_summary, "
Recommend the appropriate statistical analysis approach. Consider:
1. Which test/model to use and why
2. Key assumptions to check
3. Potential confounders
4. How to interpret results
"))

Okay, here’s a statistically sound recommendation for analyzing the mtcars dataset to address your research question, presented as a senior biostatistician would.

Recommended Analysis: Multiple Linear Regression

Given the research question (“Does transmission type affect fuel efficiency, controlling for weight and horsepower?”) and the nature of the variables, a multiple linear regression model is the most appropriate approach.

Why Regression? We want to assess the effect of transmission type (am) on mpg while accounting for the influence of weight (wt) and horsepower (hp). Regression allows us to isolate the unique contribution of ‘am’ to mpg. It also provides a quantitative estimate of this effect. A t-test would be insufficient as it doesn’t allow for the simultaneous control of multiple covariates.
Model Specification: The model would be: mpg ~ am + wt + hp. This means mpg is the dependent variable, and am, wt, and hp are the independent variables (predictors).

Key Assumptions & Checks

Regression models rely on several assumptions. Violations can compromise the validity of the results.

Linearity: The relationship between each predictor (am, wt, hp) and mpg is approximately linear.
- Check: Scatterplots of each predictor against mpg. Consider transformations (e.g., log transformation) if relationships appear non-linear. Residual plots (residuals vs. fitted values, and residuals vs. each predictor) are crucial for assessing non-linearity.
Independence of Errors: The errors (residuals) for each observation are independent of one another.
- Check: This is difficult to assess directly with this dataset. The data were collected as a road test, so independence is likely reasonable, but worth considering in the context of how the data were generated. Time-series data would require more rigorous checks.
Homoscedasticity: The variance of the errors is constant across all levels of the predictors.
- Check: Residual plots (residuals vs. fitted values). A funnel shape suggests heteroscedasticity. Formal tests (e.g., Breusch-Pagan test) can be used, but visual inspection is often sufficient.
Normality of Errors: The errors are normally distributed.
- Check: Histogram or Q-Q plot of the residuals. The Central Limit Theorem often makes this assumption less critical with a sample size of 32, but extreme departures from normality can still impact inference.
No Multicollinearity: The predictors (wt, hp, am) are not highly correlated with each other.
- Check: Correlation matrix. Variance Inflation Factor (VIF). A VIF > 5 or 10 (depending on the context) suggests problematic multicollinearity. In mtcars, wt and hp are likely correlated, which needs to be considered.

Potential Confounders

Engine Size/Displacement: This is not included in the dataset but is likely related to horsepower and fuel efficiency. Its absence is a limitation.
Car Model/Make: Cars from the same manufacturer or model line might share design characteristics that influence fuel efficiency, creating a hidden grouping effect. This is not directly addressable with the available data.
Driving Conditions: Road tests are standardized, but subtle differences in conditions could influence mpg.

Interpretation of Results

Coefficient for ‘am’: This is the key. It represents the estimated difference in mpg between cars with manual transmissions (am=1) and automatic transmissions (am=0), *holding weight and horsepower

# Generate analysis code
cat("\n<p class='llm-h2'><strong>💻 Generated R Code</strong></p>\n")

💻 Generated R Code

generate_code(paste0(data_summary, "
Write complete R code to:
1. Visualize the relationship (boxplot of mpg by transmission)
2. Fit appropriate model (multiple regression: mpg ~ am + wt + hp)
3. Check model assumptions (residual plots)
4. Report results with confidence intervals
Use base R and keep it concise.
"))

# Load dataset
data("mtcars")

# 1. Visualize the relationship (boxplot of mpg by transmission)
boxplot(mpg ~ factor(am, labels = c("Automatic", "Manual")), 
        data = mtcars, xlab = "Transmission", ylab = "MPG", main = "MPG by Transmission")

# 2. Fit multiple regression model
model <- lm(mpg ~ am + wt + hp, data = mtcars)

# 3. Check model assumptions (residual plots)
plot(model)

# 4. Report results with confidence intervals
summary(model)
confint(model)

8 Helper Functions

Reusable utilities for your workflow:

# Quick consultation
quick_ask <- function(question, model = "llama3.2:3b") {
  generate(model, question, 
           temperature = 0.3, num_predict = 500,
           output = "text")
}

# Safe code generation with retry
safe_codegen <- function(prompt, model = "qwen3:14b", max_retry = 3) {
  for(i in 1:max_retry) {
    result <- tryCatch(
      generate_code(prompt, model),
      error = function(e) {
        message(sprintf("Attempt %d failed: %s", i, e$message))
        if(i < max_retry) Sys.sleep(2)
        NULL
      }
    )
    if(!is.null(result)) return(result)
  }
  stop("Code generation failed after retries")
}

# Multi-model consensus
get_consensus <- function(question, models = c("gemma3:12b", "qwen3:14b")) {
  results <- lapply(models, function(m) {
    cat(sprintf("\n=== %s ===\n", m))
    resp <- quick_ask(question, m)
    cat(resp, "\n")
    resp
  })
  invisible(results)
}

Usage examples:

# Quick question
quick_ask("What's the formula for Cohen's d?")

# Safe code generation with auto-retry
code <- safe_codegen("Create function to calculate 95% CI for proportion")

# Get multiple model opinions
get_consensus("Should I adjust for baseline in ANCOVA for RCT?")

9 Model Selection Guide

Choose based on task complexity and speed needs:

select_model <- function(task, need_speed = FALSE) {
  switch(task,
    "code" = if(need_speed) "gemma3:4b" else "qwen3:14b",
    "reasoning" = "gemma3:12b",
    "quick" = "llama3.2:3b",
    "vision" = "qwen3-vl:8b",
    "embed" = "mxbai-embed-large",
    "llama3.2:3b"  # default
  )
}

# Example
model <- select_model("code", need_speed = FALSE)
cat("Selected model:", model)

Selected model: qwen3:14b

10 Best Practices

10.1 1. Verify Everything

# Always test generated code
code <- generate_code("Create t-test function")
# Review before running!
# eval(parse(text = code))

# Cross-check statistical advice
advice1 <- ask_expert("...", "gemma3:12b")
advice2 <- ask_expert("...", "qwen3:14b")
# Compare responses

10.2 2. Provide Rich Context

# Bad: Vague question
"What test should I use?"

# Good: Specific context
"
Study: 2-arm RCT (n=100 per arm)
Outcome: QoL score 0-100 (continuous, normally distributed)
Baseline: Measured
Analysis: ANCOVA adjusting for baseline
Question: Should I transform outcome? How to report?
"

10.3 3. Optimize Parameters

# Task-specific parameters (pass directly to functions)
# Code generation: low temp, high tokens for complete output
# generate(..., temperature = 0.2, top_p = 0.3, num_predict = 2000-3000)

# Reasoning: moderate temperature
# generate(..., temperature = 0.4, top_p = 0.5, num_predict = 1500)

# Quick answers: lower tokens
# generate(..., temperature = 0.3, num_predict = 500-1000)

# Complex tasks (full analysis): use higher num_predict to avoid truncation
# generate(..., temperature = 0.2, num_predict = 3000)

# Example usage
generate("qwen3:14b", "...", temperature = 0.2, num_predict = 2000, output = "text")

[1] "It seems your message might be incomplete or unclear. Could you please provide more details or clarify what you're asking? I'm here to help! 😊"

11 Troubleshooting

# Connection issues
test_connection()
# If failed: restart Ollama app

# Model not found
list_models()  # Check what you have
pull("qwen3:14b")  # Pull if missing

# Out of memory
# Use smaller model:
generate("gemma3:4b", "...", output = "text")

# Slow performance
# Reduce output length:
generate("llama3.2:3b", "...", 
         num_predict = 500, output = "text")

12 Real Example: Complete Analysis Workflow

From design to code:

# Step 1: Design consultation
cat("\n<p class='llm-h2'><strong>📋 Step 1: Design Consultation</strong></p>\n")

📋 Step 1: Design Consultation

design <- ask_expert("
Trial: compare 3 doses of new antihypertensive
Primary: SBP change at week 12
Secondary: DBP change, response rate (SBP<140)
Covariates: baseline SBP, age, BMI
Sample: n=75 per group
Recommend primary analysis approach
", show = FALSE)  # Capture without showing

cat("\n<div class='llm-output'>\n\n", sanitize_llm_output(design), "\n\n</div>\n")

Okay, here’s a concise, evidence-based recommendation for the primary analysis approach for your antihypertensive trial, formatted as requested.

Trial Overview & Context

Objective: Compare three doses of a new antihypertensive drug to assess efficacy based on systolic blood pressure (SBP) change at week 12.
Design: Parallel-group, randomized controlled trial.
Sample Size: 75 per group (total n=225). This is a critical assumption – see Important Assumptions below.
Primary Outcome: Change in SBP from baseline to week 12.
Secondary Outcomes: Change in diastolic blood pressure (DBP), response rate (SBP < 140 mmHg at week 12).
Covariates: Baseline SBP, age, BMI.

Recommended Primary Analysis Approach: ANCOVA

The recommended primary analysis approach is an Analysis of Covariance (ANCOVA).

Model: SBP Change at Week 12 ~ Treatment Group + Baseline SBP + Age + BMI.
Rationale:
- Handles Covariates: ANCOVA effectively adjusts for the influence of baseline SBP, age, and BMI, which are known to affect blood pressure and are likely to be unevenly distributed across treatment groups. This reduces bias and increases the precision of the treatment effect estimate.
- Statistical Power: ANCOVA generally provides greater statistical power compared to complete case analyses or multiple imputation, especially when the covariates are strongly associated with the outcome.
- Interpretability: The results are relatively straightforward to interpret – providing estimates of treatment effects adjusted for the covariates.
Post-Hoc Testing: If the overall ANCOVA model is significant, use appropriate post-hoc tests (e.g., Tukey’s HSD, Dunnett’s test) to compare the individual treatment groups, controlling for multiple comparisons. Specify the contrasts a priori.
Model Diagnostics: Thoroughly assess ANCOVA assumptions (linearity, homogeneity of regression slopes, normality of residuals, homogeneity of variance) after fitting the model. Address violations appropriately (e.g., transformations, non-parametric alternatives if severe).

Secondary Analyses & Considerations

DBP Change: Analyze DBP change using a similar ANCOVA approach.
Response Rate: Analyze response rate (SBP < 140) using a logistic regression model, including treatment group, baseline SBP, age, and BMI as predictors.
Missing Data: Develop a pre-specified plan for handling missing data. Consider multiple imputation if missingness is not strongly related to the treatment or outcome. Sensitivity analyses should be performed to assess the impact of different missing data assumptions.
Subgroup Analyses: Pre-specify any subgroup analyses (e.g., by age group, baseline SBP category) a priori and interpret results cautiously, acknowledging the potential for false positives.

Important Assumptions

Randomization Integrity: The randomization process was successful in creating balanced groups with respect to known and unknown confounders.
Covariate Measurement: Baseline SBP, age, and BMI were measured accurately and consistently across all groups.
Linearity & Regression Slopes: The relationship between covariates and SBP change is linear, and the slopes of the covariate-outcome relationships are similar across treatment groups (homogeneity of regression slopes). This is critical for ANCOVA validity.
Independence: Observations are independent.
Sample Size Adequacy: n=75 per group is sufficient to

# Step 2: Generate analysis code (use higher token limit for complex code)
cat("\n<p class='llm-h2'><strong>💻 Step 2: Analysis Code</strong></p>\n")

💻 Step 2: Analysis Code

generate_code(sprintf("
Based on this design:
%s

Create R code for:
1. Data simulation (realistic)
2. Primary analysis (ANCOVA)
3. Secondary analyses
4. Table 1
5. Publication-quality plots
Use tidyverse + emmeans. Well-commented.
", design), num_predict = 3000)

# Step 3: Generate reporting template (also needs more tokens)
cat("\n<p class='llm-h2'><strong>📝 Step 3: Report Template</strong></p>\n")

📝 Step 3: Report Template

generate_code("
Create R Markdown template for reporting the analysis above.
Include: Methods section, Results section with inline R code,
Tables and figures. Follow CONSORT style.
", num_predict = 2500)

---
title: "Randomized Controlled Trial Analysis Report"
author: "Author Name"
date: "Date"
output:
  pdf_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)

13 Abstract

(Brief summary of the study, including background, methods, results, and conclusions.)

14 Introduction

(Context, objectives, and hypotheses of the study.)

15 Methods

15.1 Study Design

This study is a randomized controlled trial (RCT) following the CONSORT guidelines. Participants were randomly assigned to either the intervention group or the control group. The primary outcome was [insert primary outcome], and secondary outcomes included [insert secondary outcomes].

15.2 Participants

(Description of inclusion/exclusion criteria, recruitment procedures, and sample size calculations.)

15.3 Interventions

(Detailed description of the interventions in both groups.)

15.4 Outcomes

Primary Outcome: [Description]
Secondary Outcomes: [Description]

15.5 Statistical Methods

(Statistical tests used, software, and handling of missing data. Example: "Analyses were performed using R version 4.3.1. Continuous variables are presented as means ± SD, and categorical variables as frequencies (percentages). Comparisons between groups were conducted using t-tests and chi-squared tests.)

16 Results

16.1 Participant Flow

{r participant-flow, fig.cap="CONSORT flow diagram", fig.align="center"} <p class='llm-h1'><strong>Example: Replace with actual flow diagram (e.g., using

DiagrammeRorggplot2`)

library(ggplot2) library(dplyr)

Example data (replace with actual data)

flow_data <- tibble( Stage = c(“Enrollment”, “Randomized”, “Completed”, “Analyzed”), Group = c(“Intervention”, “Intervention”, “Intervention”, “Intervention”), Count = c(150, 130, 120, 110) )

ggplot(flow_data, aes(x = Stage, y = Group, fill = Group)) + geom_bar(stat = “identity”, position = “dodge”) + coord_flip() + labs(title = “CONSORT Flow Diagram”, x = “Stage”, y = “Group”) + theme_minimal()


## Baseline Characteristics

```{r baseline-table}
<p class='llm-h1'><strong>Example: Replace with actual data</strong></p>
baseline_data <- tibble(
  Variable = c("Age (years)", "Sex (Male, %)", "BMI (kg/m²)", "Baseline Outcome"),
  Intervention = c(45.2, 55.3, 26.5, 12.3),
  Control = c(44.8, 48.7, 27.1, 14.1)
)

kable(baseline_data, caption = "Baseline characteristics of participants", align = "l|r|r")

16.2 Primary Outcome

```{r primary-outcome}

Example: Replace with actual analysis

library(ggplot2)

Simulated data

outcome_data <- tibble( Group = rep(c(“Intervention”, “Control”), each = 100), Result = c(rnorm(100, mean = 10, sd = 2), rnorm(100, mean = 8, sd = 2)) )

Summary statistics

summary_stats <- outcome_data %>% group_by(Group) %>% summarise( Mean = mean(Result), SD = sd(Result), n = n() )

kable(summary_stats, caption = “Primary outcome summary statistics”, align = “l|r|r|r”)

Visualization

ggplot(outcome_data, aes(x = Group, y = Result, fill = Group)) + geom_boxplot() + labs(title = “Primary Outcome Distribution”, x = “Group”, y = “Outcome Value”) + theme_minimal()


## Secondary Outcomes

```{r secondary-outcomes}
<p class='llm-h1'><strong>Example: Replace with actual analysis</strong></p>
secondary_data <- tibble(
  Outcome = c("Secondary 1", "Secondary 2"),
  Intervention = c(30, 45),
  Control = c(25, 40),
  p_value = c(0.12, 0.08)
)

kable(secondary_data, caption = "Secondary outcome results", align = "l|r|r|r")

16.3 Adverse Events

(Text description of adverse events, with inline R code for tabular data if needed.)

17 Discussion

(Interpretation of results, comparison with previous studies, limitations, and implications.)

18 Conclusion

(Summary of key findings and recommendations.)

19 References

(Formatted references following a standard style, e.g., APA or Vancouver.)


---

<p class='llm-h3'><strong>Notes for Use:</strong></p>
1. Replace placeholder text (e.g., outcome variables, study details) with actual data.
2. Use `knitr::kable()` or `gt::gt()` for tables and `ggplot2` for figures.
3. Ensure CONSORT flow diagram aligns with actual participant flow data.
4. Adjust statistical methods and R code to match the analysis performed.

</div>

---

# Resources & References

<div class="info-box">
**📚 Learn More:** Explore these resources to deepen your understanding of local LLMs and biostatistics with R.
</div>

## Official Documentation

| Resource | Description |
|----------|-------------|
| [ollamar Package](https://hauselin.github.io/ollama-r/) | Official R package documentation |
| [Ollama Models](https://ollama.com/library) | Browse available models |
| [Ollama Blog](https://ollama.com/blog) | Latest features and tutorials |

## Recommended Reading

- **Structured Outputs**: [ollama.com/blog/structured-outputs](https://ollama.com/blog/structured-outputs)
- **Tool Calling**: [ollama.com/blog/tool-support](https://ollama.com/blog/tool-support)
- **Model Selection Guide**: Choose models based on your task complexity

## Citation

If you use ollamar in your research, please cite:

```bibtex
@article{Lin2025,
  author = {Lin, Hause and Safi, Tawab},
  title = {ollamar: An R package for running large language models},
  journal = {Journal of Open Source Software},
  volume = {10},
  number = {105},
  pages = {7211},
  year = {2025},
  doi = {10.21105/joss.07211}
}

20 Session Information

Click to expand session info

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] httr2_1.2.1   ollamar_1.2.2

loaded via a namespace (and not attached):
 [1] knitr_1.50      magrittr_2.0.4  rappdirs_0.3.3  R6_2.6.1       
 [5] rlang_1.1.6     fastmap_1.2.0   tools_4.1.2     xfun_0.54      
 [9] cli_3.6.5       withr_3.0.2     jquerylib_0.1.4 htmltools_0.5.9
[13] yaml_2.3.11     digest_0.6.39   tibble_3.3.0    lifecycle_1.0.4
[17] crayon_1.5.3    sass_0.4.10     vctrs_0.6.5     curl_7.0.0     
[21] glue_1.8.0      cachem_1.1.0    evaluate_1.0.5  rmarkdown_2.30 
[25] compiler_4.1.2  bslib_0.9.0     pillar_1.11.1   jsonlite_2.0.0 
[29] pkgconfig_2.0.3

🎉 Thank You for Exploring!

Found this useful? Star the repository on GitHub!

github.com/ahjavid/R-Ollama-Lab

Local LLMs for Biostatistics: Practical Guide

Using Ollama + R for Statistical Analysis

Amir Javid

2025-12-18