Page 394 - DCAP103_Principle of operating system
P. 394
Unit 13: Input/Output and Security of Windows
13.2.6 File Compression Notes
NTFS supports transparent file compression. A file can be created in compressed mode,
which means that NTFS automatically tries to compress the blocks as they are written to disk
and automatically uncompresses them when they are read back. Processes that read or write
compressed files are completely unaware of the fact that compression and decompression are
going on.
Compression works as follows. When NTFS writes a file marked for compression to disk, it
examines the first 16 (logical) blocks in the file, irrespective of how many runs they occupy. It
then runs a compression algorithm on them. If the resulting data can be stored in 15 or fewer
blocks, the compressed data are written to the disk, preferably in one run, if possible. If the
compressed data still take 16 blocks, the 16 blocks are written in uncompressed form. Then blocks
16-31 are examined to see if they can be compressed to 15 blocks or less, and so on.
Figure 13.11 shows a file in which the first 16 blocks have successfully compressed to eight
blocks, the second 16 blocks failed to compress, and the third 16 blocks have also compressed
by 50%. The three parts have been written as three runs and stored in the MFT record. The
“missing” blocks are stored in the MFT entry with disk address 0 as shown in Figure 13.11.
Here the header (0, 48) is followed by five pairs, two for the first (compressed) run, one for the
uncompressed run, and two for the final (compressed) run.
When the file is read back, NTFS has to know which runs are compressed and which are not.
It sees that based on the disk addresses. A disk address of 0 indicates that it is the final part of
16 compressed blocks. Disk block 0 may not be used for storing data, to avoid ambiguity. Since
it contains the boot sector, using it for data is impossible anyway.
Figure 13.11: (a) An Example of a 48-Blocks File being Compressed to 32 Blocks
(b) The MFT Record for the File after Compression
Random access to compressed files is possible, but tricky. Suppose that a process does a seek
to block 35 in Figure 13.11. How does NTFS locate block 35 in a compressed file? The answer
is that it has to read and decompress the entire run first. Then it knows where block 35 is and
can pass it to any process that reads it. The choice of 16 blocks for the compression unit was
a compromise. Making it shorter would have made the compression less effective. Making it
longer would have made random access more expensive.
LOVELY PROFESSIONAL UNIVERSITY 387