Page 280 - DCAP403_Operating System
P. 280
Unit 13: Case Study: Linux
Optimising existing UNIX Interfaces Notes
There are improvements we can make for the massive FD scanning problem. Firstly we can
optimise the way the scanning is done inside the kernel. Right now (2.1.106) the kernel has
to call the poll() method for each file structure. This is expensive. Back in the 2.1.5x kernels, I
coded a better implementation for the kernel which sped things up almost 3 times. While this
requires modifications to drivers to take advantage of this, it has the advantage of not changing
the semantics we expect from UNIX. Note one other interesting feature of this optimisation: it
centralises event notification, which in turn would make implementing I/O readiness queues
simpler. I’m not sure how closure of FDs before readiness events are read should be handled.
This could complicate their implementation.
Doing this optimisation does not solve our problem, though. It only pushes the problem away
for a while.
Making better use of existing UNIX Interfaces
Note that for my purposes, it is better to optimise the application so that it works well on many
OSes rather than optimising a single OS. Creating new interfaces for Linux is a last resort. Also
note that this section assumes that an OS of interest does not have an existing (preferably POSIX)
mechanism that supports FD management in a scalable way.
Another solution (which would also benefit from the kernel optimisation discussed above) is
for the application to divide the FD array into a number of smaller FD arrays, say 10. You then
create 10 threads, each of which has a polling loop using its smaller FD array. So each FD array is
now 100 entries long. While this doesn’t change the total number of FDs that must be scanned, it
does change when they have to be scanned. Since most FDs are inactive, not all the threads will
be woken up. Too see how this works, consider the example where, at any time (say during a
single timeslice of 10 ms), only 5 FDs are active. Assuming these FDs are randomly, uniformly
distributed, at most 5 threads will need to be woken up. These threads then process the activity
and go back to the start of their polling loops. Where we win is that only 5 threads had to go back
and call select(2) or poll(2). Since they each have 100 entry FD arrays, the kernel only has to scan
500 FDs. This has halved the amount of scanning required. The scanning load has gone from 30%
to 15% by this simple change. If you were to instead use 100 threads, you would still only have at
most 5 threads woken up for activity, and hence the total number of FDs scanned this timeslice
would be 50. This takes down the scanning load to 0.15%, which is negligible.
There is one thing to watch out for here: if you use select(2) in your polling loop, be aware that
the size of your FD array is equal to the value of your largest FD. This is because select(2) uses
a bitmask for its FD array. This means one of your threads will want to poll FDs 991 to 1000.
Unfortunately, your FD array is still 1000 long. What’s worse, the kernel still has to do a minimal
scan for all those 1000 FDs. The solution to this is to use poll(2) instead, where you only have to
pass as many FDs as you want to poll, and the kernel scans only those.
This solution sounds ideal: just create lots and lots of threads. At the extreme, you create one
thread per FD. There is a problem here, however, as each thread consumes system resources.
So you need to compromise between the number of threads and the FD scanning load. Also, the
more threads you have the more cache misses you induce, so this is something to avoid as well.
Fortunately in this case most threads will be running nearly the same code at the same time, so
cache pollution should not be a signifi cant problem.
A more advanced solution is to have dynamic migration of FDs depending on whether they are
mostly active or inactive. In the simplest case, you only have two threads. One which polls mostly
active FDs and the other polls mostly inactive FDs. The thread for active FDs will be woken up
very frequently, but on the other hand will have only a small number of FDs to scan. The other
thread will have to scan a large number of FDs, but it will only be woken up occasionally. For
LOVELY PROFESSIONAL UNIVERSITY 273