Page 191 - DCAP303_MULTIMEDIA_SYSTEMS
P. 191

Unit 11: Compression



            11.4 redundancies and its types                                                       notes

            The term data compression refers to the process of reducing the amount of data required to
            represent a given quantity of information. A clear distinction must be made between data and
            information. They are not synonymous. In fact, data are the means by which information is
            conveyed. Various amounts of data may be used to represent the same amount of information.
            Such might be the case, for example, if a long-winded individual and someone who is short and
            to the point where to relate the same story. Here, the information of interest is the story; words are
            the data used to relate the information. If the two individuals use a different number of words to
            tell the same basic story, two different versions of the story are created, and at least one includes
            nonessential data. That is, it contains data (or words) that either provide no relevant information
            or simply restate that which is already known. It is thus said to contain data redundancy. Data
            redundancy is a central issue in digital image compression. It is not an abstract concept but a
            mathematically quantifiable entity. If n  and n  denote the number of information-carrying units
                                           1
                                                2
            in two data sets that represent the same information, the relative data redundancy RD of the first
            data set (the one characterized by n ) can be defined as
                                        1
                                      R   =  1 −  C 1
                                        D
                                                 R
            where C  , commonly called the compression ratio, is
                  R
                                       C   =   n n 2 1
                                        R
            For the case n  = n , C  = 1 and R  = 0, indicating that (relative to the second data set) the first
                                       D
                       2
                             R
                           1
            representation of the information contains no redundant data. When n  << n , C  → ∞ and
                                                                            1
                                                                               R
                                                                      2
            R → 1, implying significant compression and highly redundant data. Finally, when n  >> n ,
             D
                                                                                       1
                                                                                  2
            C  → 0 and R  → ∞, indicating that the second data set contains much more data than the original
             R
                      D
            representation. This, of course, is the normally undesirable case of data expansion. In general,
            C  and R  lie in the open intervals (0, ∞) and (–∞, 1), respectively. A practical compression ratio,
                   D
             R
            such as 10 (or 10 : 1), means that the first data set has 10 information carrying units (say, bits) for
            every 1 unit in the second or compressed data set. The corresponding redundancy of 0.9 implies
            that 90% of the data in the first data set is redundant.
            In digital image compression, three basic data redundancies can be identified and exploited:
            Coding redundancy, interpixel redundancy and psychovisual redundancy.
            11.4.1 Coding redundancy
            In this, we utilize formulation to show how the grey-level histogram of an image also can
            provide a great deal of insight into the construction of codes to reduce the amount of data used
            to represent it.
            Let us assume, once again, that a discrete random variable r  in the interval [0, 1] represents the
                                                           k
            grey levels of an image and that each r  occurs with probability p  (r ).
                                          k
                                                                  k
                                                               r
                                    P (r )  =   n n k    k = 0, 1, 2, ..., L – 1
                                       k
                                     r
            where, L is the number of grey levels, n  is the number of times that the k th grey level appears in
                                           k
            the image, and n is the total number of pixels in the image. If the number of bits used to represent
            each value of r  is l (r ), then the average number of bits required to represent each pixel is
                       k
                            k
                                             L − 1
                                                ()
                                      L avg   =  ∑  lr pr().
                                                 k
                                                    r
                                                      k
                                             k = 0
            That is, the average length of the code words assigned to the various grey-level values is found by
            summing the product of the number of bits used to represent each grey-level and the probability
            that the grey-level occurs. Thus, the total number of bits required to code an M X N image is
            MNL avg .
                                             LoveLy professionaL University                                   185
   186   187   188   189   190   191   192   193   194   195   196