Page 223 - DMGT209_QUANTITATIVE_TECHNIQUES_II
P. 223
Quantitative Techniques-II
Notes
Notes It is important to note that Correlation is not Causation - two variables can be very
strongly correlated, but both can be caused by a third variable.
Example: Consider two variables: (1) how much my grass grows per week, and (2) the
average depth of the local reservoir. Both variables could be highly correlated because both are
dependent upon a third variable – how much it rains.
11.1 Regression Analysis
If the coefficient of correlation calculated for bivariate data (X , Y ), i = 1, 2, ...... n, is reasonably
i i
high and a cause and effect type of relation is also believed to be existing between them, the next
logical step is to obtain a functional relation between these variables. This functional relation is
known as regression equation in statistics. Since the coefficient of correlation is measure of the
degree of linear association of the variables, we shall discuss only linear regression equation.
This does not, however, imply the non-existence of non-linear regression equations.
The regression equations are useful for predicting the value of dependent variable for given
value of the independent variable. As pointed out earlier, the nature of a regression equation is
different from the nature of a mathematical equation, e.g., if Y = 10 + 2X is a mathematical
equation then it implies that Y is exactly equal to 20 when X = 5. However, if Y = 10 + 2X is a
regression equation, then Y = 20 is an average value of Y when X = 5.
The term regression was first introduced by Sir Francis Galton in 1877. In his study of the
relationship between heights of fathers and sons, he found that tall fathers were likely to have
tall sons and vice-versa. However, the mean height of sons of tall fathers was lower than the
mean height of their fathers and the mean height of sons of short fathers was higher than the
mean height of their fathers. In this way, a tendency of the human race to regress or to return to
a normal height was observed. Sir Francis Galton referred this tendency of returning to the
mean height of all men as regression in his research paper, “Regression towards mediocrity in
hereditary stature”. The term ‘Regression’, originated in this particular context, is now used in
various fields of study, even though there may be no existence of any regressive tendency.
11.1.1 Simple Regression
For a bivariate data (X , Y ), i = 1, 2, ...... n, we can have either X or Y as independent variable. If
i i
X is independent variable then we can estimate the average values of Y for a given value of X.
The relation used for such estimation is called regression of Y on X. If on the other hand Y is used
for estimating the average values of X, the relation will be called regression of X on Y. For a
bivariate data, there will always be two lines of regression. It will be shown later that these two
lines are different, i.e., one cannot be derived from the other by mere transfer of terms, because
the derivation of each line is dependent on a different set of assumptions.
Line of Regression of Y on X
The general form of the line of regression of Y on X is Y = a + bX , where Y denotes the average
Ci i Ci
or predicted or calculated value of Y for a given value of X = X . This line has two constants, a and
i
b. The constant a is defined as the average value of Y when X = 0. Geometrically, it is the intercept
of the line on Y-axis. Further, the constant b, gives the average rate of change of Y per unit change
in X, is known as the regression coefficient.
218 LOVELY PROFESSIONAL UNIVERSITY