Page 223 - DMGT209_QUANTITATIVE_TECHNIQUES_II
P. 223

Quantitative Techniques-II



                      Notes


                                       Notes  It is important to note that Correlation is not Causation - two variables can be very
                                       strongly correlated, but both can be caused by a third variable.


                                           Example: Consider two variables: (1) how much my grass grows per week, and (2) the
                                    average depth of the local reservoir. Both variables could be highly correlated because both are
                                    dependent upon a third variable – how much it rains.

                                    11.1 Regression Analysis

                                    If the coefficient of correlation calculated for bivariate data (X , Y ), i = 1, 2, ...... n, is reasonably
                                                                                      i  i
                                    high and a cause and effect type of relation is also believed to be existing between them, the next
                                    logical step is to obtain a functional relation between these variables. This functional relation is
                                    known as regression equation in statistics. Since the coefficient of correlation is measure of the
                                    degree of linear association of the variables, we shall discuss only linear regression equation.
                                    This does not, however, imply the non-existence of non-linear regression equations.

                                    The regression equations are useful for predicting the value of dependent variable for given
                                    value of the independent variable. As pointed out earlier, the nature of a regression equation is
                                    different from the nature of a mathematical equation, e.g., if  Y = 10 + 2X is a mathematical
                                    equation then it implies that Y is exactly equal to 20 when X = 5. However, if Y = 10 + 2X is a
                                    regression equation, then Y = 20 is an average value of Y when X = 5.
                                    The term regression was first introduced by Sir Francis Galton in 1877. In his study of the
                                    relationship between heights of fathers and sons, he found that tall fathers were likely to have
                                    tall sons and vice-versa. However, the mean height of sons of tall fathers was lower than the
                                    mean height of their fathers and the mean height of sons of short fathers was higher than the
                                    mean height of their fathers. In this way, a tendency of the human race to regress or to return to
                                    a normal height was observed. Sir Francis Galton referred this tendency of returning to the
                                    mean height of all men as regression in his research paper, “Regression towards mediocrity in
                                    hereditary stature”. The term ‘Regression’, originated in this particular context, is now used in
                                    various fields of study, even though there may be no existence of any regressive tendency.

                                    11.1.1 Simple Regression

                                    For a bivariate data (X , Y ), i = 1, 2, ...... n, we can have either X or Y as independent variable. If
                                                      i  i
                                    X is independent variable then we can estimate the average values of Y for a given value of X.
                                    The relation used for such estimation is called regression of Y on X. If on the other hand Y is used
                                    for estimating the average values of X, the relation will be called regression of X on Y. For a
                                    bivariate data, there will always be two lines of regression. It will be shown later that these two
                                    lines are different, i.e., one cannot be derived from the other by mere transfer of terms, because
                                    the derivation of each line is dependent on a different set of assumptions.

                                    Line of Regression of Y on X

                                    The general form of the line of regression of Y on X is Y  = a + bX ,  where Y  denotes the average
                                                                                Ci     i       Ci
                                    or predicted or calculated value of Y  for a given value of X = X . This line has two constants, a and
                                                                                     i
                                    b. The constant a is defined as the average value of Y when X = 0. Geometrically, it is the intercept
                                    of the line on Y-axis. Further, the constant b, gives the average rate of change of Y per unit change
                                    in X, is known as the regression coefficient.





            218                              LOVELY PROFESSIONAL UNIVERSITY
   218   219   220   221   222   223   224   225   226   227   228