# simrel-m

## A simulation tool and its application

Supervisors:
http://mathatistics.github.io/nsm-17

### 11 June, 2017

Man is a tool-using animal. Without tools he is nothing, with tools he is all.
— Thomas Carlyle

# simrel-m: A versatile tool for simulating multi-response linear model data

## Why simrel-m

• By changing few parameters, we can simulate wide range of linear model data. For example,
1. Controlling degree of multicollinearity in the simulated data
2. Specifying the relevant principle components for prediction
• It is easy to use and has wide application ## The idea behind     • Based on simrel package
• Predictor Space (Blue Box)
• A model defines its relationship with Response Space (Green Box)
• Subspace within these spaces (a reduced regression model) contains information for this relationship
• Set of orthogonal variables $$(Z)$$ span the relevant predictor subspace (predictor components)
• Set of orthogonal variables $$(W)$$ span the response subspace (response components)
• Implement this idea to construct the relevant covariance matrix and make simulation with it
• ## How it works   • Gets parameter setting from users
• Creates Covariance matrix
• Creates Rotation Matrix
• Rotates the sampled Latent variables

## How it works   • Gets parameter setting from users
• Creates Covariance matrix
• Creates Rotation Matrix
• Rotates the sampled Latent variables

# How to get it

Install simrel-m:

devtools::install_github(
"therimalaya/simulatr",
quiet = TRUE
)

Run the shiny app:

shiny::runGitHub(
"AppSimulatr",
"therimalaya"
)

Documentation:

https://therimalaya.github.io/simulatr/

# An example of comparison of estimation methods

## Design Properties

Consider two sets of data, both having following common properties,

 Number of observation 100 Number of variables 16 Number of predictors relevant for each response components 5, 5, 5 Number of response variables 5 Relevant position of response component 1, 6; 2, 5; 3, 4 Position of Response components to rotate together 1, 4; 2, 5; 3

The difference between the two datasets are

Design1 Design2
Decay of eigenvalue $$(\gamma)$$ 0.2 0.8
Coef. of Determination $$(\rho^2)$$ 0.8, 0.8, 0.4 0.4, 0.4, 0.4

## Estimation Methods

For comparison, let’s consider the following estimation methods,

• Ordinary Least Squares (ols)
• Principle Component Regression (pcr)
• Partial Least Squares (pls) 
• Canonical Partial Least Squares (cpls) 
• Envelope Estimation of predictor space (env) 

## A Comparison # Some Cases

Case I

• Testing new estimation Methods
• Studying its properties
• Studying its performance in data with various properties

Case II

• Educational use
• Students can learn how a method such as variable selection removes irrelevant variables
• Students can observe and study the loading weights on relevant and irrelevant principle components

Case III

• Comparing various methods (estimation methods, variable selection techniques)

## References

 S. Sæbø, T. Almøy, I.S. Helland, Simrel—A versatile tool for linear model data simulation based on the concept of a relevant subspace and relevant predictors, Chemometrics and Intelligent Laboratory Systems 146 (2015) 128–35.

 H. Wold, Partial least squares, Encyclopedia of Statistical Sciences (1985).

 U.G. Indahl, K.H. Liland, T. Næs, Canonical partial least squaresa unified pls approach to classification and regression problems, Journal of Chemometrics 23(9) (2009) 495–504.

 R.D. Cook, B. Li, F. Chiaromonte, Envelope models for parsimonious and efficient multivariate linear regression, Statistica Sinica (2010) 927–60.

 I.S. Helland, Partial least squares regression and statistical models, Scandinavian Journal of Statistics (1990) 97–114.