Page 317 - DMTH404_STATISTICS
P. 317
Unit 22: Correlation
This value indicates the presence of a very high positive correlation between the marks obtained Notes
in two papers.
Example 19:
Find out the coefficient of correlation by concurrent deviation method from the following
information:
Number of pairs of deviations = 96
Number of concurrent deviations = 32
Solution.
We are given C = 32 and D = 96
2 C D 64 96 1 1
Now, which is negative, r 0.577
C
D 96 3 3
22.10 Summary
One of the variable may be affecting the other: A correlation coefficient calculated from
the data on quantity demanded and corresponding price of tea would only reveal that the
degree of association between them is very high. It will not give us any idea about
whether price is affecting demand of tea or vice-versa. In order to know this, we need to
have some additional information apart from the study of correlation. For example if, on
the basis of some additional information, we say that the price of tea affects its demand,
then price will be the cause and quantity will be the effect. The causal variable is also
termed as independent variable while the other variable is termed as dependent variable.
The two variables may act upon each other: Cause and effect relation exists in this case
also but it may be very difficult to find out which of the two variables is independent. For
example, if we have data on price of wheat and its cost of production, the correlation
between them may be very high because higher price of wheat may attract farmers to
produce more wheat and more production of wheat may mean higher cost of production,
assuming that it is an increasing cost industry. Further, the higher cost of production may
in turn raise the price of wheat. For the purpose of determining a relationship between the
two variables in such situations, we can take any one of them as independent variable.
The two variables may be acted upon by the outside influences: In this case we might get a
high value of correlation between the two variables, however, apparently no cause and
effect type relation seems to exist between them. For example, the demands of the two
commodities, say X and Y, may be positively correlated because the incomes of the
consumers are rising. Coefficient of correlation obtained in such a situation is called a
spurious or nonsense correlation.
A high value of the correlation coefficient may be obtained due to sheer coincidence
(or pure chance): This is another situation of spurious correlation. Given the data on any
two variables, one may obtain a high value of correlation coefficient when in fact they do
not have any relationship. For example, a high value of correlation coefficient may be
obtained between the size of shoe and the income of persons of a locality.
Let the bivariate data be denoted by (X , Y ), where i = 1, 2 ...... n. In order to have some idea
i i
about the extent of association between variables X and Y, each pair (X , Y ), i = 1, 2......n, is
i i
plotted on a graph. The diagram, thus obtained, is called a Scatter Diagram.
LOVELY PROFESSIONAL UNIVERSITY 309