Page 394 - DCAP103_Principle of operating system
P. 394

Unit 13: Input/Output and Security of Windows



            13.2.6 File Compression                                                               Notes
            NTFS  supports  transparent  file  compression.  A  file  can  be  created  in  compressed  mode,
            which means that NTFS automatically tries to compress the blocks as they are written to disk
            and automatically uncompresses them when they are read back. Processes that read or write
            compressed files are completely unaware of the fact that compression and decompression are
            going on.
            Compression works as follows. When NTFS writes a file marked for compression to disk, it
            examines the first 16 (logical) blocks in the file, irrespective of how many runs they occupy. It
            then runs a compression algorithm on them. If the resulting data can be stored in 15 or fewer
            blocks, the compressed data are written to the disk, preferably in one run, if possible. If the
            compressed data still take 16 blocks, the 16 blocks are written in uncompressed form. Then blocks
            16-31 are examined to see if they can be compressed to 15 blocks or less, and so on.
            Figure 13.11 shows a file in which the first 16 blocks have successfully compressed to eight
            blocks, the second 16 blocks failed to compress, and the third 16 blocks have also compressed
            by 50%. The three parts have been written as three runs and stored in the MFT record. The
            “missing” blocks are stored in the MFT entry with disk address 0 as shown in Figure 13.11.
            Here the header (0, 48) is followed by five pairs, two for the first (compressed) run, one for the
            uncompressed run, and two for the final (compressed) run.
            When the file is read back, NTFS has to know which runs are compressed and which are not.
            It sees that based on the disk addresses. A disk address of 0 indicates that it is the final part of
            16 compressed blocks. Disk block 0 may not be used for storing data, to avoid ambiguity. Since
            it contains the boot sector, using it for data is impossible anyway.


                   Figure 13.11: (a) An Example of a 48-Blocks File being Compressed to 32 Blocks
                             (b) The MFT Record for the File after Compression
























            Random access to compressed files is possible, but tricky. Suppose that a process does a seek
            to block 35 in Figure 13.11. How does NTFS locate block 35 in a compressed file? The answer
            is that it has to read and decompress the entire run first. Then it knows where block 35 is and
            can pass it to any process that reads it. The choice of 16 blocks for the compression unit was
            a compromise. Making it shorter would have made the compression less effective. Making it
            longer would have made random access more expensive.




                                             LOVELY PROFESSIONAL UNIVERSITY                                   387
   389   390   391   392   393   394   395   396   397   398   399