Skip to content

Resende-Lab/EnvKernel

Repository files navigation

Package EnvKernel: Fast Rcpp Kernels

This is a repo for sharing the R package EnvKernel. It was created when we were looking to implement several non-linear kernels for studying environmental covariate-based relationship matrices and could not find a good package that would do so. It was implemented in Rcpp (Eddelbuettel & Balamuta, 2018) to speed up the process. Feel free to contact me and point out any issues for improvement.

Marco Peixoto and João Paulo Gusmão



Install it using this token

library(devtools)
devtools::install_github(repo = "Resende-Lab/EnvKernel")

Citation

If you use EnvKernel, please cite:

Peixoto, M., & Gusmão, J. P. (2026). EnvKernel: Environmental Kernel Methods in R. https://github.com/Resende-Lab/EnvKernel

Vignette

Example

# Load data and the package
library(EnvKernel)
data("envMarks")

# Pick the kernel and generate the matrix using one of the following methods:
meth = c("ETK","ELK","EEK", "EGK", "EDK")

EnvCov <- EnvKernel::getKernel(envMarks, scale = TRUE, method=meth[1])

# Matrix
EnvCov[1:5,1:5]

# Plot
heatmap(EnvCov)

Kernel Equation Implementations

There are five implementations in the package so far. All kernels operate on a scaled data matrix
$W ∈ ℝ^{n × p}$ (rows = observations/environments, columns = variables/environmental covariates/markers). I recommend using the argument scale for that, available in the main function getKernel().


1. ETK: Transposed (Gram) Kernel

$$ K =\ \frac{WW^T}{p} $$

  • Normalized by the number of columns $p$.

2. ELK: Linear Kernel (Jarquín et al., 2014; Sorensen et al., 2012)

$$ K_{ij} =\ \frac{(WW^T)_{ij}} {\displaystyle \frac{1}{n}\sum_{k=1}^{n}(WW^T)_{kk}} $$

  • $WW^T$ is the Gram matrix (inner products of rows).
  • Normalized by the mean trace.
  • $n$ is the number of rows in the matrix $W$.

3. EGK: Gaussian / Radial Basis Function Kernel (Schölkopf & Smola, 2002)

$$ K_{ij} = \exp\Bigl( -\frac{|x_i - x_j|^2}{2\sigma^2} \Bigr) $$

  • $x_i$, $x_j$ are the $i$-th and $j$-th rows of $W$.
  • $\sigma$ is the positive bandwidth parameter.

4. EEK: Exponential / Laplacian Kernel (Schölkopf & Smola, 2002; Genton, 2001)

$$ K_{ij} = \exp\Bigl( -\frac{\phi \cdot |x_i - x_j|}{\bar{D}} \Bigr) $$

  • $x_i$ and $x_j$ are the $i$-th and $j$-th rows of $W$.
  • $|x_i - x_j|$ is the Euclidean distance (not squared).
  • $\bar{D} = \frac{1}{n(n-1)}\sum_{k \neq l} |x_k - x_l|$ is the mean pairwise distance.
  • $\phi$ is the positive bandwidth scaling parameter (default = 1.0).

5. EDK: Arc-Cosine (Deep) Kernel – First Order (Cho & Saul, 2009)

$$ K_{ij} = \frac{|x_i||x_j|}{\pi} \Bigl[ \sin\theta_{ij} +\bigl(\pi-\theta_{ij}\bigr)\cos\theta_{ij} \Bigr] $$ where $$ \theta_{ij} = \cos^{-1}!\Bigl( \frac{x_i^{\top}x_j}{|x_i||x_j|} \Bigr) $$

  • Mimics one hidden layer of ReLU units in a neural network.

Notation Summary

Symbol Meaning
$W$ Data matrix ($n \times p$)
$n$ Number of rows (observations/environments)
$p$ Number of columns (variables)
$x_i$ $i$-th row vector of $W$
$\sigma$ Bandwidth/hyperparameter
$\phi$ Bandwidth scaling parameter
$\Sigma$ Empirical covariance matrix
$\bar{D}$ Mean pairwise distance

References

  • Cho, Y., & Saul, L. K. (2009). Kernel methods for deep learning. Advances in Neural Information Processing Systems, 22.
  • Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4). New York: springer.
  • Eddelbuettel, D., & Balamuta, J. (2018). Extending R with C++: A Brief Introduction to Rcpp. The American Statistician, 72(1), 28-36. https://doi.org/10.1080/00031305.2017.1375990
  • Genton, M. G. (2001). Classes of kernels for machine learning: A statistics perspective. Journal of Machine Learning Research, 2, 299-312.
  • Jarquín, D., Crossa, J., Lacaze, X., Du Cheyron, P., Daucourt, J., Lorgeou, J., ... & de los Campos, G. (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical and Applied Genetics, 127, 595-607.
  • Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
  • Sorensen, D., Fernando, R., & Gianola, D. (2001). Inferring the trajectory of genetic variance in the course of artificial selection. Genetical Research, 77(1), 83-94.

About

EnvKernel repo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors