Page 279 - DCAP403_Operating System
P. 279

Operating System




                    Notes          The Traditional UNIX Way

                                   Traditional Unix systems provide the select(2) and/or poll(2) system calls. With both of these
                                   you pass an array of FDs to the kernel, with an optional timeout. When there is activity, or when
                                   the call times out, the system call will return. The application must then scan the array to see
                                   which FDs are active. This scheme works well with small numbers of FDs, and is simple to use.
                                   Unfortunately, for thousands of FDs, this does not work so well.
                                   The kernel has to scan your array of FDs and check which ones are active. This takes approximately
                                   3 microseconds (3 us) per FD on a Pentium 100 running Linux 2.1.x. Now you might think that
                                   3 us is quite fast, but consider if you have an array of 1000 FDs. This is now 3 milliseconds (3
                                   ms), which is 30% of your timeslice (each timeslice is 10 ms). If it happens that there is initially
                                   no activity and you specified a timeout, the kernel will have to perform a second scan after some

                                   activity occurs or the syscall times out. Ouch! If you have an even bigger application (like a large
                                   http server), you can easily have 10000 FDs. Scanning times will then take 30 ms, which is three
                                   timeslices! This is just way too much.
                                   You might say that 3 ms for 1000 FDs is not such a big deal: a user will hardly notice that. The
                                   problem is that the entire array of FDs is scanned each time you want to go back to your polling
                                   loop. The way these applications work is that after checking for activity on FDs, the application
                                   processes the activity (for example, reading data from active FDs). When all the activity has been
                                   processed, the application goes back to polling the OS for more FD activity. In many cases, only
                                   a small number of FDs are active at any one time (say during each timeslice), so it may only
                                   take a few milliseconds to process all the activity. High performance http servers can process
                                   hundreds or thousands of transactions per second. A server that takes 2 ms to process each active
                                   FD can process 500 transactions per second. If you add 3 ms for FD scanning in the kernel, you
                                   now have 5 ms per transaction. That only gives 200 transactions per second, a massive drop in
                                   performance.

                                   There is another problem, and that is that the application needs to scan the “returned” FD array
                                   that the kernel has updated to see which FDs are active. This is yet another scan of a large array.
                                   This isn’t as costly as the kernel scan, for reasons I’ll get to later, but it is still a fi nite cost.

                                   New POSIX Interfaces
                                   A fairly simple proposal is to use the POSIX.4 Asynchronous I/O (AIO) interface (aio_read()
                                   and friends). Here we would call aio_read() for each FD. This would then queue thousands of
                                   asynchronous I/O requests. This model looks appealing, until we look under the hood of some
                                   aio_*() implementations. The Linux glibc implementation is a case in point: there is no kernel
                                   support. Instead, the C library (glibc 2.1) launches a thread per FD for which there are outstanding

                                   AIO requests (up to the maximum number of configured threads). In general, implementing this
                                   facility in the C library is reasonable, as it avoids kernel bloat. However, if you use this facility to
                                   start thousands of AIO requests, you may end up creating thousands of threads. This is no good,
                                   since threads are costly. The “obvious” solution is to implement AIO in the Linux kernel, then.
                                   Another solution is to use userspace tricks to avoid the scalability problems (see the description

                                   of migrating FDs below). These solutions may be fine if you only want to run under Linux, but
                                   is not much help if you want to run under another OS which also implements AIO using threads
                                   (and for which you don’t have the source code so you can change the implementation). The point
                                   here is that there appears to be no guarantee that aio_*() implementations are scalable across
                                   platforms which support it.
                                   It is also worth noting that POSIX.4 Asynchronous I/O is not necessarily available on all POSIX.4

                                   compliant systems (facilities defined by POSIX.4 are optional). So even if you were prepared to
                                   limit your application to POSIX.4 systems, there is still no guarantee that AIO is available. Many
                                   or most implementations will be scalable, but we can’t be sure all are scalable, so we need an
                                   alternative.



          272                              LOVELY PROFESSIONAL UNIVERSITY
   274   275   276   277   278   279   280   281   282   283   284