Page 297 - DMTH404_STATISTICS
P. 297
Unit 22: Correlation
changes in a variable are due to changes in the other, i.e., whether a cause and effect type Notes
relationship exists between them, are not answered by the study of correlation analysis. If there
is a correlation between two variables, it may be due to any of the following situations:
(i) One of the variable may be affecting the other: A correlation coefficient calculated from
the data on quantity demanded and corresponding price of tea would only reveal that the
degree of association between them is very high. It will not give us any idea about
whether price is affecting demand of tea or vice-versa. In order to know this, we need to
have some additional information apart from the study of correlation. For example if, on
the basis of some additional information, we say that the price of tea affects its demand,
then price will be the cause and quantity will be the effect. The causal variable is also
termed as independent variable while the other variable is termed as dependent variable.
(ii) The two variables may act upon each other: Cause and effect relation exists in this case
also but it may be very difficult to find out which of the two variables is independent. For
example, if we have data on price of wheat and its cost of production, the correlation
between them may be very high because higher price of wheat may attract farmers to
produce more wheat and more production of wheat may mean higher cost of production,
assuming that it is an increasing cost industry. Further, the higher cost of production may
in turn raise the price of wheat. For the purpose of determining a relationship between the
two variables in such situations, we can take any one of them as independent variable.
(iii) The two variables may be acted upon by the outside influences: In this case we might get a
high value of correlation between the two variables, however, apparently no cause and
effect type relation seems to exist between them. For example, the demands of the two
commodities, say X and Y, may be positively correlated because the incomes of the
consumers are rising. Coefficient of correlation obtained in such a situation is called a
spurious or nonsense correlation.
(iv) A high value of the correlation coefficient may be obtained due to sheer coincidence
(or pure chance): This is another situation of spurious correlation. Given the data on any
two variables, one may obtain a high value of correlation coefficient when in fact they do
not have any relationship. For example, a high value of correlation coefficient may be
obtained between the size of shoe and the income of persons of a locality.
22.2 Scatter Diagram
Let the bivariate data be denoted by (X , Y ), where i = 1, 2 ...... n. In order to have some idea about
i i
the extent of association between variables X and Y, each pair (X , Y ), i = 1, 2......n, is plotted on
i i
a graph. The diagram, thus obtained, is called a Scatter Diagram.
Each pair of values (X , Y ) is denoted by a point on the graph. The set of such points (also known
i i
as dots of the diagram) may cluster around a straight line or a curve or may not show any
tendency of association. Various possible situations are shown with the help of given diagrams:
LOVELY PROFESSIONAL UNIVERSITY 289