Page 232 - DMGT209_QUANTITATIVE_TECHNIQUES_II
P. 232

Unit 11: Multiple Regression and Correlation Analysis



            11.2 Meaning of Multiple Regressions                                                  Notes


            Multiple regressions are a statistical technique that allows us to predict someone’s score on one
            variable on the basis of their scores on several other variables. An example might help. Suppose
            we were interested in predicting how much an individual enjoys their job. Variables such as
            salary, extent of academic qualifications, age, sex, number of years in full-time employment and
            socioeconomic status might all contribute towards job satisfaction. If we collected data on all of
            these variables, perhaps by surveying a few hundred members of the public, we would be able
            to see how many and which of these variables gave rise to the most accurate prediction of job
            satisfaction. We might find that job satisfaction is most accurately predicted by type of occupation,
            salary and years in full-time employment, with the other variables not helping us to predict job
            satisfaction.

            When using multiple regressions in psychology, many researchers use the term “independent
            variables” to identify those variables that they think will influence some other “dependent
            variable”. We prefer to use the term “predictor variables” for those variables that may be useful
            in predicting the scores on another variable that we call the “criterion variable”. Thus, in our
            example above, type of occupation, salary and years in full-time employment would emerge as
            significant predictor variables, which allow us to estimate the criterion variable – how satisfied
            someone is likely to be with their job. As we have pointed out before, human behaviour is
            inherently noisy and therefore it is not possible to produce totally accurate predictions, but
            multiple regressions allow us to identify a set of predictor variables which together provide a
            useful estimate of a participant’s likely score on a criterion variable.
            In the case of simple linear regression, one variable, say, X  is affected by a linear combination
                                                           1
            of another variable X  (we shall use X  and X  instead of Y and X used earlier). However, if X  is
                             2            1     2                                     1
            affected by a linear combination of more than one variable, the regression is termed as a
            multiple linear regression.
            Let there be k variables X , X  ...... X , where one of these, say X , is affected by the remaining
                                 1  2    k                      j
            k – 1 variables. We write the typical regression equation as
            X  = a           + b          X  + b           X  +......(j = 1, 2,.... k).
             jc  j×1, 2, .... j–1, j + 1, .... k  j 1.2,3, .... j –1, j + 1, ....k  1  j 2.1, 3, .... j – 1, j + 1, ....k  2
            Here a   , b     ...... etc. are constants. The constant  aj.1,2, .... is interpreted as the value of X
                 j.1,2, ....   j1.2, 3, ....                                            j
            when X , X , ..... X , X   ..... X  are all equal to zero. Further, b  b  etc.,
                  2  3     j-1  j + 1  k                       j1.2,3, .... j–1, j + 1, ....k,  j2.1,3, .... j –1, j +1, ....k
            are (k – 1) partial regression coefficients of regression of X  on X , X  ...... X  , X   ...... X .
                                                           j   1  2     j – 1  j + 1  k
            For simplicity, we shall consider three variables X1, X2 and X3. The three possible regression
            equations can be written as
                           X   = a   + b  X  + b  X                                 .... (1)
                             1c   1.23  12.3  2  13.2  3
                           X   = a   + b  X  + b  X                                 .... (2)
                             2c   2.13  21.3  1  23.1  3
                           X   = a   + b  X  + b  X                                 .... (3)
                             3c   3.12  31.2  1  32.1  2
            Given n observations on  X , X  and X , we want to find such values of the constants of the
                                  1  2     3
                                     n       2
                                  
            regression equation so that  X   X ijc  ,   j = 1, 2, 3, is minimised.
                                       ij
                                   i 1
                !

              Caution   For  convenience,  we  shall  use  regression  equations  expressed  in  terms  of
              deviations of variables from their respective means.






                                             LOVELY PROFESSIONAL UNIVERSITY                                  227
   227   228   229   230   231   232   233   234   235   236   237