Page 193 - DMGT404 RESEARCH_METHODOLOGY
P. 193

Unit 9: Correlation and Regression




          bivariate data, there will always be two lines of regression. It will be shown later that these two  Notes
          lines are different, i.e., one cannot be derived from the other by mere transfer of terms, because
          the derivation of each line is dependent on a different set of assumptions.

          Line of Regression of Y on X

          The general form of the line of regression of Y on X is Y  = a + bX ,  where Y  denotes the average
                                                      Ci     i       Ci
          or predicted or calculated value of Y  for a given value of X = X . This line has two constants, a and
                                                           i
          b. The constant a is defined as the average value of Y when X = 0. Geometrically, it is the intercept
          of the line on Y-axis. Further, the constant b, gives the average rate of change of Y per unit change
          in X, is known as the regression coefficient.
          The above line is known if the values of a and b are known. These values are estimated from the
          observed data (X , Y ), i = 1, 2, ...... n.
                        i  i



             Notes  It is important to distinguish between Y  and Y . Where as Y  is the observed value,
                                                  Ci    i         i
             Y  is a value calculated from the regression equation.
              Ci
          Using the regression YC  = a + bX , we can obtain Y , Y , ...... Y  corresponding to the X values
                             i       i             C1  C2    Cn
          X , X , ......  X  respectively.  The difference between the  observed and calculated  value for a
           1  2      n
          particular value of X say Xi is called error in estimation of the i th observation on the assumption
          of a particular line of regression. There will be similar type of errors for all the  n observations.
          We denote by e  = Y  – Y  (i = 1, 2,.....n), the error in estimation of the i th observation. As is
                       i   i  Ci
          obvious from Figure 9.4, ei will be positive if the observed point lies above the line and will be
          negative if the observed point lies below the line. Therefore, in order to obtain a Figure of total
          error,  ei¢s  are  squared  and  added.  Let  S  denote  the  sum  of  squares  of  these  errors,
                 n
          i.e.,  S = å e =  n  (Y - Y  ) .
                    i å
                               2
                    2
                 i= 1  i= 1  i  Ci
                                            Figure  9.4
                                   Y
                                               Y
                                                 Y    a   bX  i
                                                 i
                                                       +
                                                    =
                                                   ci
                                               Y
                                                ci
                                  a
                                   O          X i             X



          The regression line can, alternatively, be written as a deviation of Y  from Y  i.e. Y  – Y  = e  or
                                                                i      Ci    i  Ci  i
          Y  = Y  + e  or Y  = a + bX  + e . The component a + bX  is known as the deterministic component and
           i   Ci  i  i      i  i                 i
          e  is random component.
           i
          The value of S  will be different for different lines of regression. A different line of regression
          means a different pair of constants a and b. Thus, S is a function of a and b. We want to find such
          values of a and b so that S is minimum. This method of finding the values of a and b is known as
          the Method of Least Squares.
                                                2
          Rewrite the above equation as S = S(Y  – a – bX )  (Y  = a + bX ).
                                         i      i    Ci       i



                                           LOVELY PROFESSIONAL UNIVERSITY                                   187
   188   189   190   191   192   193   194   195   196   197   198