Skip to contents

This vignette illustrates the use of the rrda package, dedicated to ridge redundancy analysis for high-dimensional omics data.

While conventional methods of RDA (or RRDA) are restricted to relatively small matrices, rrda extends these computations to high-dimensional data—including datasets with more than 100,000 features.

This capability makes it especially valuable in modern omics sciences and other fields where analyzing extremely large datasets is essential.

Requirements

The packages required for the analysis are `rrda`` plus some others for data manipulation and representation:

Mathematical background

Classic formula

The Ridge Redundancy Analysis model is represented as:

Let YY be an n×qn \times q matrix of response variables, and let XX be an n×pn \times p matrix of predictor variables. We assume that the columns of matrices YY and XX are centered to have means of 0.

The relation between XX and YY is given by

Y=XB+E Y = X B + E

where BB is the matrix of regression coefficients, and EE is an error matrix.

The ridge RDA aims to estimate BB under both rank and ridge restrictions. The corresponding optimization problem is

B̂(λ,r)=argminB{YXB2+λB2}subject torank(B)r \hat{B}(\lambda, r) = \arg\min_B \left\{ \| Y - X B \|^2 + \lambda \| B \|^2 \right\} \quad \text{subject to} \quad \text{rank}(B) \leq r

where A2=tr(AA)\| A \|^2 = \text{tr}(A'A) is the Frobenius norm and λ\lambda is the ridge regularization parameter.

For High-Dimensional Data

For Ridge Redundancy Analysis, the ridge estimator B̂(λ)\hat{B}(\lambda) can be decomposed as follows:

UB̂(λ)=V(D2+λI)1/2Ûλ,DB̂(λ)=D̂λ,VB̂(λ)=V̂λ U_{\hat{B}(\lambda)} = V (D^{2} + \lambda I)^{-1/2}\hat{U}_{\lambda}, \quad D_{\hat{B}(\lambda)} = \hat{D}_{\lambda}, \quad V_{\hat{B}(\lambda)} = \hat{V}_{\lambda}

where the Singular Value Decomposition (SVD) of XX is X=UDV X = U D V^{\prime} and Ûλ,D̂λ,V̂λ\hat{U}_{\lambda}, \hat{D}_{\lambda}, \hat{V}_{\lambda} are the SVD components of

Mλ=(D2+λI)1/2DUY. M_\lambda = (D^{2} + \lambda I)^{-1/2} D U^{\prime} Y.

Since MλM_\lambda depends on λ\lambda, its SVD must be computed for each value of λ\lambda.
However, efficiency can be gained because MλM_\lambda is the product of:

  • a diagonal matrix (D2+λI)1/2(D^{2} + \lambda I)^{-1/2}, and
  • a constant matrix UYU'Y (independent of λ\lambda).

Final Expression

The rank-restricted ridge estimator is:

B̂(λ,r)=V(D2+λI)1/2Ûλ[r]D̂λ[r]V̂λ[r]. \hat{B}(\lambda,r) = V(D^{2} + \lambda I)^{-1/2} \hat{U}_{\lambda}^{[r]} \hat{D}_{\lambda}^{[r]} \hat{V}_{\lambda}^{[r]\prime}.

Important Note 1: Ridge Redundancy Analysis (RRDA) has two parameters:
- Ridge penalty λ\lambda - Rank rr

Important Note 2: RRDA is efficiently rewritten with SVD of X.

Simulation

Data generation

Here we generate two matrices, X and Y, with a chosen rank constraint k.

set.seed(10)
simdata <- rdasim1(n = 50, p = 100, q = 100, k = 5)
X <- simdata$X
Y <- simdata$Y

Parameter tuning

cv_result <- rrda.cv(Y = Y, X = X, nfold = 5)
#> Call:
#> rrda.cv(Y = Y, X = X, lambda = 4.801 - 48010, maxrank = 15, nfold = 5)
rrda.summary(cv_result = cv_result)
#> === opt_min ===
#> MSE: 
#> [1] 1.218769
#> rank: 
#> [1] 5
#> lambda: 
#> [1] 66.71
#> 
#> === opt_lambda.1se ===
#> MSE: 
#> [1] 1.244403
#> rank: 
#> [1] 5
#> lambda: 
#> [1] 248.7
#> 
#> === opt_rank.1se ===
#> MSE: 
#> [1] 1.218769
#> rank: 
#> [1] 5
#> lambda: 
#> [1] 66.71

Visualize Results

Here we illustrate the parameter tuning process (regularization path), which helps identify the optimal parameter for maximizing prediction accuracy from X to Y.

p <- rrda.plot(cv_result) # cv result plot
print(p)

h <- rrda.heatmap(cv_result) # cv result heatmap
print(h)


best_lambda <- cv_result$opt_min$lambda  # selected parameter
best_rank <- cv_result$opt_min$rank # selected parameter

Bhat <- rrda.fit(Y = Y, X = X, nrank = best_rank, lambda = best_lambda) # fitting
Bhat_mat <- rrda.coef(Bhat)
#> rank:
#> [1] 5
#> lambda:
#> [1] 66.71
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat <- Yhat_mat[[1]][[1]][[1]] # predicted values

cor_Y_Yhat <- diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.4813  0.8754  0.9131  0.8964  0.9371  0.9813

plot(Y, Yhat)
abline(0, 1, col = "red") 
title(sub = "Fitted values vs observations")

Two-Dimensional Plot

Traditionally, (Ridge) RDA is visualized in two dimensions.
However, in omics studies, this approach can sometimes overload the plot with information, making interpretation challenging.

# You want to specify one lambda in `rrda.fit` to visualize

ud <- Bhat$Bhat_comp[[1]][[1]] # SVD component of B (UD)
v <- Bhat$Bhat_comp[[1]][[2]] # SVD component of B (V)

ud12 <- ud[, 1:2]
v12 <- v[, 1:2]

# Base plot: ud (e.g., site scores)
plot(v12, 
     xlab = "RRDA1", ylab = "RRDA2", 
     xlim = range(c(ud12[,1], v12[,1])) * 1.1, 
     ylim = range(c(ud12[,2], v12[,2])) * 1.1, 
     pch = 19, col = "darkgreen", 
     main = "RRDA")

# Add v (e.g., species scores) as arrows from origin
arrows(0, 0, ud12[,1], ud12[,2], col = "blue3", length = 0.1)

# Optionally add text labels
text(ud12, labels = paste0("X", 1:nrow(ud12)), pos = 3, col = "blue3", cex = 0.6)
text(v12, labels = paste0("Y", 1:nrow(v12)), pos = 3, col = "darkgreen", cex = 0.6)

Top Feature Heatmap

For better interpretability, we visualize the feature–feature matrix using a selected dimensionality, highlighting the most informative features.

best_lambda <- cv_result$opt_min$lambda  
best_rank <- cv_result$opt_min$rank
rrda.top(Y = Y, X = X, nrank = best_rank, lambda = best_lambda, mx = 20, my = 20)

References

Yoshioka H, Aubert J, Mary-Huard T (????). rrda: Ridge Redundancy Analysis for High-Dimensional Omics Data. R package version 0.2.3, https://cran.r-project.org/package=rrda

Yoshioka, H., Aubert, J., Iwata, H., and Mary-Huard, T., 2025. Ridge Redundancy Analysis for High-Dimensional Omics Data. bioRxiv, doi: 10.1101/2025.04.16.649138

@article {Yoshioka2025.04.16.649138,
    author = {Yoshioka, Hayato and Aubert, Julie and Iwata, Hiroyoshi and Mary-Huard, Tristan},
    title = {Ridge Redundancy Analysis for High-Dimensional Omics Data},
    elocation-id = {2025.04.16.649138},
    year = {2025},
    doi = {10.1101/2025.04.16.649138},
    URL = {https://www.biorxiv.org/content/early/2025/09/10/2025.04.16.649138},
    eprint = {https://www.biorxiv.org/content/early/2025/09/10/2025.04.16.649138.full.pdf},
    journal = {bioRxiv}
}