NeuroAgent

Fix stat_regline_equation Mismatch in ggplot2

Resolve the discrepancy between regression equations and lines in ggplot2. Learn about orthogonal vs raw polynomials and implement proper quadratic regression in R with code examples.

#data-science-r #ggplot2-visualization #ggpubr-package #polynomial-regression #r-programming #statistical-graphics #statistical-regression

11/07/2025, 09:39 AM

Why doesn’t the regression equation from stat_regline_equation match the regression line from stat_smooth in R?

I’m encountering an issue where I’ve created a scatter plot with a quadratic regression line using ggplot2 in R. The regression line is plotted correctly using stat_smooth(), but the equation displayed by stat_regline_equation() doesn’t match the line.

Here’s the code I’m using:

#LIBRARIES
library(tidyverse)
library(ggpubr)
library(ggplot2)

test_db <- data.frame(
  X = c(0,-0.1297,-0.2185,-0.2795),
  Y = c(0.7569,0.7396,0.6561,0.5476)
)


test_plot = ggscatter(test_db, x = "X", y = "Y",
                     xlim = c(-1,0),
                     ylim = c(0, 1),
                     fill = "red",
                     shape = 23) +
  stat_regline_equation(label.x = -0.95, label.y = 0.15,  #print equation for regression line
                        formula = y ~ poly(x, 2),         #set regression line as quadratic
                        show.legend=FALSE) +
  stat_smooth(method = "lm",               #plot smoothed conditional mean
              formula = y ~ poly(x, 2),    #set regression line as quadratic
              colour="black", size=0.5,
              fill="red") + 
  theme_bw() + 
  labs(x = "X", 
       y = "Y",
       title = "Test plot")
test_plot

What could be causing this discrepancy between the regression line and the displayed equation?

NeuroAgent

The discrepancy between stat_regline_equation and stat_smooth in your code occurs because the poly() function in R uses orthogonal polynomials by default, which can cause coefficient interpretation issues even though the fitted regression lines should theoretically be identical.

Understanding the Poly() Function Issue
Orthogonal vs Raw Polynomials
Differences Between stat_smooth and stat_regline_equation
Solutions to Fix the Discrepancy
Best Practices for Polynomial Regression in ggplot2
Complete Working Example

Understanding the Poly() Function Issue

The root cause of your problem lies in how the poly() function handles polynomial terms. When you use formula = y ~ poly(x, 2) in both functions, R creates orthogonal polynomial terms by default, which can lead to unexpected coefficient interpretations.

According to the ggpubr documentation, stat_regline_equation uses the lm() function internally, but the way poly() is processed can differ between stat_smooth and stat_regline_equation in terms of default parameters.

The Stack Overflow question (R stat_regline_equation doesn’t match stat_smooth line) demonstrates this exact issue and shows that adding raw=TRUE resolves the problem.

Orthogonal vs Raw Polynomials

Let me clarify the key difference:

Orthogonal Polynomials (Default)

poly(x, 2)  # Uses orthogonal polynomials by default

Creates uncorrelated polynomial terms
Better for numerical stability
Coefficients represent contributions from orthogonalized terms
Harder to interpret directly

Raw Polynomials

poly(x, 2, raw = TRUE)  # Uses raw polynomial terms

Creates standard x and x² terms
Coefficients represent the expected mathematical relationship
Easier to interpret and match manual calculations

Important Note: Both approaches should produce the same fitted regression line, but the coefficient values will differ significantly. The issue in your case likely stems from how different ggplot2 layers handle the poly() function.

Differences Between stat_smooth and stat_regline_equation

While both functions use linear models internally, they have different default behaviors:

Function	Default Formula Handling	Model Fitting	Output Focus
`stat_smooth()`	Processes formula directly in ggplot2 context	Fits model for visualization	Produces smooth line with confidence intervals
`stat_regline_equation()`	Processes formula through ggpubr’s lm wrapper	Fits model for equation display	Extracts coefficients and R² for text display

The GitHub source code for stat_regline_equation shows that it uses lm() but may have different parameter handling compared to ggplot2’s stat_smooth.

Solutions to Fix the Discrepancy

Solution 1: Use Raw=TRUE (Recommended)

Modify your code to explicitly use raw polynomials:

test_plot = ggscatter(test_db, x = "X", y = "Y",
                     xlim = c(-1,0),
                     ylim = c(0, 1),
                     fill = "red",
                     shape = 23) +
  stat_regline_equation(label.x = -0.95, label.y = 0.15,
                        formula = y ~ poly(x, 2, raw = TRUE),  # Use raw=TRUE
                        show.legend=FALSE) +
  stat_smooth(method = "lm",
              formula = y ~ poly(x, 2, raw = TRUE),  # Use raw=TRUE
              colour="black", size=0.5,
              fill="red") + 
  theme_bw() + 
  labs(x = "X", y = "Y", title = "Test plot")

Solution 2: Use Raw Polynomial Terms Directly

An alternative approach is to manually create polynomial terms:

test_db <- test_db %>%
  mutate(x2 = X^2)  # Create x² term manually

test_plot = ggscatter(test_db, x = "X", y = "Y",
                     xlim = c(-1,0),
                     ylim = c(0, 1),
                     fill = "red",
                     shape = 23) +
  stat_regline_equation(label.x = -0.95, label.y = 0.15,
                        formula = y ~ X + x2,  # Use raw terms directly
                        show.legend=FALSE) +
  stat_smooth(method = "lm",
              formula = y ~ X + x2,  # Use raw terms directly
              colour="black", size=0.5,
              fill="red") + 
  theme_bw() + 
  labs(x = "X", y = "Y", title = "Test plot")

Solution 3: Verify Model Consistency

You can also fit the models separately to verify they’re identical:

# Fit the model manually
manual_model <- lm(Y ~ poly(X, 2, raw = TRUE), data = test_db)
summary(manual_model)

# Compare with what stat_smooth produces
ggplot(test_db, aes(x = X, y = Y)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2, raw = TRUE))

Best Practices for Polynomial Regression in ggplot2

Always specify raw = TRUE when using poly() in regression equations to ensure consistent interpretation
Be cautious with small datasets - With only 4 data points as in your example, quadratic regression may be overfitting
Consider the number of parameters - A quadratic model with 4 data points uses 3 parameters (intercept, linear, quadratic), leaving only 1 degree of freedom
Validate your model - Check R² and p-values to ensure the model is meaningful
Use consistent syntax - Ensure the same formula specification across all ggplot2 layers

Complete Working Example

Here’s a complete, corrected version of your code:

#LIBRARIES
library(tidyverse)
library(ggpubr)
library(ggplot2)

# Create test data
test_db <- data.frame(
  X = c(0, -0.1297, -0.2185, -0.2795),
  Y = c(0.7569, 0.7396, 0.6561, 0.5476)
)

# Create plot with consistent raw polynomial specification
test_plot <- ggscatter(test_db, x = "X", y = "Y",
                      xlim = c(-1, 0),
                      ylim = c(0, 1),
                      fill = "red",
                      shape = 23) +
  stat_regline_equation(label.x = -0.95, label.y = 0.15,
                        formula = y ~ poly(x, 2, raw = TRUE),  # Key fix: raw = TRUE
                        show.legend = FALSE) +
  stat_smooth(method = "lm",
              formula = y ~ poly(x, 2, raw = TRUE),  # Consistent specification
              colour = "black", size = 0.5,
              fill = "red") + 
  theme_bw() + 
  labs(x = "X", y = "Y", title = "Test plot with Quadratic Regression")

# Display the plot
test_plot

This code will now display both the regression line and equation that match perfectly, as both functions use the same raw polynomial specification.

Conclusion

The discrepancy between stat_regline_equation and stat_smooth in your ggplot2 code was caused by the default behavior of the poly() function using orthogonal polynomials. By explicitly setting raw = TRUE, you ensure both functions use the same polynomial basis, resulting in consistent regression lines and equations.

Key takeaways:

Always specify raw = TRUE when using poly() in regression equations
Ensure consistent formula specifications across all ggplot2 layers
Be aware of the mathematical differences between orthogonal and raw polynomials
Consider the appropriateness of polynomial regression for your dataset size

This solution addresses the immediate issue while following best practices for statistical visualization in R.

Sources

How to properly display regression equations in ggplot2 with confidence intervals?What's the difference between orthogonal and raw polynomials in R regression?How to create custom regression equations in ggplot2 using ggpubr?Why does my quadratic regression line look different from the equation in R?How to troubleshoot regression visualization issues in ggplot2?Best practices for polynomial regression in R ggplot2 visualizations?

Ask NeuroAgent