Page 232 - DMGT209_QUANTITATIVE_TECHNIQUES_II
P. 232
Unit 11: Multiple Regression and Correlation Analysis
11.2 Meaning of Multiple Regressions Notes
Multiple regressions are a statistical technique that allows us to predict someone’s score on one
variable on the basis of their scores on several other variables. An example might help. Suppose
we were interested in predicting how much an individual enjoys their job. Variables such as
salary, extent of academic qualifications, age, sex, number of years in full-time employment and
socioeconomic status might all contribute towards job satisfaction. If we collected data on all of
these variables, perhaps by surveying a few hundred members of the public, we would be able
to see how many and which of these variables gave rise to the most accurate prediction of job
satisfaction. We might find that job satisfaction is most accurately predicted by type of occupation,
salary and years in full-time employment, with the other variables not helping us to predict job
satisfaction.
When using multiple regressions in psychology, many researchers use the term “independent
variables” to identify those variables that they think will influence some other “dependent
variable”. We prefer to use the term “predictor variables” for those variables that may be useful
in predicting the scores on another variable that we call the “criterion variable”. Thus, in our
example above, type of occupation, salary and years in full-time employment would emerge as
significant predictor variables, which allow us to estimate the criterion variable – how satisfied
someone is likely to be with their job. As we have pointed out before, human behaviour is
inherently noisy and therefore it is not possible to produce totally accurate predictions, but
multiple regressions allow us to identify a set of predictor variables which together provide a
useful estimate of a participant’s likely score on a criterion variable.
In the case of simple linear regression, one variable, say, X is affected by a linear combination
1
of another variable X (we shall use X and X instead of Y and X used earlier). However, if X is
2 1 2 1
affected by a linear combination of more than one variable, the regression is termed as a
multiple linear regression.
Let there be k variables X , X ...... X , where one of these, say X , is affected by the remaining
1 2 k j
k – 1 variables. We write the typical regression equation as
X = a + b X + b X +......(j = 1, 2,.... k).
jc j×1, 2, .... j–1, j + 1, .... k j 1.2,3, .... j –1, j + 1, ....k 1 j 2.1, 3, .... j – 1, j + 1, ....k 2
Here a , b ...... etc. are constants. The constant aj.1,2, .... is interpreted as the value of X
j.1,2, .... j1.2, 3, .... j
when X , X , ..... X , X ..... X are all equal to zero. Further, b b etc.,
2 3 j-1 j + 1 k j1.2,3, .... j–1, j + 1, ....k, j2.1,3, .... j –1, j +1, ....k
are (k – 1) partial regression coefficients of regression of X on X , X ...... X , X ...... X .
j 1 2 j – 1 j + 1 k
For simplicity, we shall consider three variables X1, X2 and X3. The three possible regression
equations can be written as
X = a + b X + b X .... (1)
1c 1.23 12.3 2 13.2 3
X = a + b X + b X .... (2)
2c 2.13 21.3 1 23.1 3
X = a + b X + b X .... (3)
3c 3.12 31.2 1 32.1 2
Given n observations on X , X and X , we want to find such values of the constants of the
1 2 3
n 2
regression equation so that X X ijc , j = 1, 2, 3, is minimised.
ij
i 1
!
Caution For convenience, we shall use regression equations expressed in terms of
deviations of variables from their respective means.
LOVELY PROFESSIONAL UNIVERSITY 227