Link Search Menu Expand Document

Regression Analysis

Regression

Defininiton

Regression Analysis captures the relationship between one or more response variables (dependent/predicted variable – denoted by Y) and the its predictor variables (independent/explanatory variables – denoted by X) using historical observations of both.

Its estimates the functional relationship between a set of independent variables X1, X2, …, Xp with the response variable Y which estimate of the functional form best fits the historical data.

Type of Linear Regression

Dependent Variable Type Residual Distribution Types of Regression Details
Continuous Normal (with constant variance) Ordinary Least Squares  
Continuous Normal (without constant variance) Generalized Least Squares  
Binary Binomial Logistic Regression  
Discrete Poisson Poisson Regression  
Rational Exponential Family of Distribution Generalized Least Squares  
    Simultaneous Equation Models When both X and Y are dependent on each other
    Structural Equation Models Captures the inter-relations between Xs (how Xs affect each other before affecting Y)
    Survival Analysis Predicts a decay curve for a probability of an event
    Hierarchal Bayesian Estimates a non-linear equation

Linear Regression as Model

  • It can be used for the following two distinct but related purposes
    • Predict certain events
    • Identify the drivers of certain events based on some explanatory variables
  • Isolates individual effects and then quantifies the magnitude of that driver to its impact on the dependent variable

Ordinary Least Squares model assumptions

  • Linearity - Model is linear in parameters
    • Yi=a+b1X1i+b2X2i+…+bpXpi+ei
  • Spherical Errors - Error distribution is Normal with mean 0 & constant variance
    • e2i ~ Normal(0, σ)
  • Zero Expected Error - The expected value (or mean) of the errors is always zero
    • E(ei)=0 for all i
  • Homoskedasticity - The errors have constant variance
    • Variance(ei)=constant for all i
  • Non-Autocorrelation - The errors are statistically independent from one another. This implies the data is a random sample of the population
    • corr(ei, ej)=0 for all i≠j
  • Non-Multicollinearity - The independent variables are not collinear
    • Covariance (Xi,Xj) = 0