simrel-m

A simulation tool and its application

Raju Rimal
Supervisors:
Solve Sæbø
and
Trygve Almøy

11 June, 2017

Man is a tool-using animal. Without tools he is nothing, with tools he is all.
— Thomas Carlyle

simrel-m: A versatile tool for simulating multi-response linear model data

Why simrel-m


  • By changing few parameters, we can simulate wide range of linear model data. For example,
    1. Controlling degree of multicollinearity in the simulated data
    2. Specifying the relevant principle components for prediction
  • It is easy to use and has wide application

The idea behind

reduction-model-01 reduction-model-02 reduction-model-03 reduction-model-03 reduction-model-03

  • Based on simrel[1] package
  • Predictor Space (Blue Box)
  • A model defines its relationship with Response Space (Green Box)
  • Subspace within these spaces (a reduced regression model) contains information for this relationship
  • Set of orthogonal variables \((Z)\) span the relevant predictor subspace (predictor components)
  • Set of orthogonal variables \((W)\) span the response subspace (response components)
  • Implement this idea to construct the relevant covariance matrix and make simulation with it
  • How it works

    • Gets parameter setting from users
    • Creates Covariance matrix
    • Creates Rotation Matrix
    • Rotates the sampled Latent variables

    How it works

    • Gets parameter setting from users
    • Creates Covariance matrix
    • Creates Rotation Matrix
    • Rotates the sampled Latent variables

    A web interface

    How to get it

    Install simrel-m:

    devtools::install_github(
      "therimalaya/simulatr",
      quiet = TRUE
    )

    Run the shiny app:

    shiny::runGitHub(
      "AppSimulatr", 
      "therimalaya"
    )

    Documentation:

    https://therimalaya.github.io/simulatr/

    An example of comparison of estimation methods

    Design Properties

    Consider two sets of data, both having following common properties,

    Number of observation 100
    Number of variables 16
    Number of predictors relevant for each response components 5, 5, 5
    Number of response variables 5
    Relevant position of response component 1, 6; 2, 5; 3, 4
    Position of Response components to rotate together 1, 4; 2, 5; 3

    The difference between the two datasets are

      Design1 Design2
    Decay of eigenvalue \((\gamma)\) 0.2 0.8
    Coef. of Determination \((\rho^2)\) 0.8, 0.8, 0.4 0.4, 0.4, 0.4

    Estimation Methods

    For comparison, let’s consider the following estimation methods,

    • Ordinary Least Squares (ols)
    • Principle Component Regression (pcr)
    • Partial Least Squares (pls) [2]
    • Canonical Partial Least Squares (cpls) [3]
    • Envelope Estimation of predictor space (env) [4]

    A Comparison

    Some Cases

    Case I

    • Testing new estimation Methods
    • Studying its properties
    • Studying its performance in data with various properties

    Case II

    • Educational use
    • Students can learn how a method such as variable selection removes irrelevant variables
    • Students can observe and study the loading weights on relevant and irrelevant principle components

    Case III

    • Comparing various methods (estimation methods, variable selection techniques)

    References

    References

    [1] S. Sæbø, T. Almøy, I.S. Helland, Simrel—A versatile tool for linear model data simulation based on the concept of a relevant subspace and relevant predictors, Chemometrics and Intelligent Laboratory Systems 146 (2015) 128–35.

    [2] H. Wold, Partial least squares, Encyclopedia of Statistical Sciences (1985).

    [3] U.G. Indahl, K.H. Liland, T. Næs, Canonical partial least squares—a unified pls approach to classification and regression problems, Journal of Chemometrics 23(9) (2009) 495–504.

    [4] R.D. Cook, B. Li, F. Chiaromonte, Envelope models for parsimonious and efficient multivariate linear regression, Statistica Sinica (2010) 927–60.

    [5] I.S. Helland, Partial least squares regression and statistical models, Scandinavian Journal of Statistics (1990) 97–114.