Page 299 - DMTH404_STATISTICS
P. 299
Unit 22: Correlation
22.3 Karl Pearson's Coefficient of Linear Correlation Notes
Let us assume, again, that we have data on two variables X and Y denoted by the pairs (X , Y ),
i i
i = 1,2, ...... n. Further, let the scatter diagram of the data be as shown in figure 22.3.
Let X and Y be the arithmetic means of X and Y respectively. Draw two lines X = X and Y = Y
on the scatter diagram. These two lines, intersect at the point (X ,Y ) and are mutually
perpendicular, divide the whole diagram into four parts, termed as I, II, III and IV quadrants, as
shown.
Figure 22.3
As mentioned earlier, the correlation between X and Y will be positive if low (high) values of X
are associated with low (high) values of Y. In terms of the above figure, we can say that when
values of X that are greater (less) than X are generally associated with values of Y that are
greater (less) than Y , the correlation between X and Y will be positive. This implies that there
will be a general tendency of points to concentrate in I and III quadrants. Similarly, when
correlation between X and Y is negative, the point of the scatter diagram will have a general
tendency to concentrate in II and IV quadrants.
d d
Further, if we consider deviations of values from their means, i.e., X - Xi and Y - Yi, we
i
i
note that:
d d
(i) Both X - Xi and Y - Yi will be positive for all points in quadrant I.
i
i
d
(ii) d X - Xi will be negative and Y - Yi will be positive for all points in quadrant II.
i
i
d
d
(iii) Both X - Xi and Y - Yi will be negative for all points in quadrant III.
i
i
d
(iv) d X - Xi will be positive and Y - Yi will be negative for all points in quadrant IV.
i
i
d
d
It is obvious from the above that the product of deviations, i.e., X - Xi Y - Yi will be positive
i
i
for points in quadrants I and III and negative for points in quadrants II and IV.
Since, for positive correlation, the points will tend to concentrate more in I and III quadrants
than in II and IV, the sum of positive products of deviations will outweigh the sum of negative
products of deviations. Thus, X X Y Y will be positive for all the n observations.
i
i
Similarly, when correlation is negative, the points will tend to concentrate more in II and IV
quadrants than in I and III. Thus, the sum of negative products of deviations will outweigh the
sum of positive products and hence X X Y Y will be negative for all the n observations.
i
i
Further, if there is no correlation, the sum of positive products of deviations will be equal to the
sum of negative products of deviations such that X X Y Y will be equal to zero.
i
i
LOVELY PROFESSIONAL UNIVERSITY 291