Everything you ever wanted to know about the 6x86 Variable-Size Paging Mechanism but never dared ask.
This page is preliminary. I have contacted Mike Jagdis, who wrote the Linux Cyrix patches, to discuss VSPM, and I am also trying to reach members of the 6x86 design team at Cyrix or other people who could help me discuss paging issues on x86 processors.
Mike has clearly stated that VSPM should not be used on Rev. 2.6 or lower 6x86's.
I shall not contact Linus since a) he has other priorities, b) he wouldn't even have the time to read my Email, and c) he possibly does not have a 6x86 box available for testing.
There are various good books* that explain what paging is, and paging as implemented on the i386 architecture is pretty standard. You won't find much about paging on the Web, I guess it's a complex subject that interests very few people. I found a good set of slides by prof. Devadas at the MIT, but that seems to be all.
Paging is an entirely different thing from segmentation (see Tanenbaum's book).
Small pages have to be frequently read from swap and may cause page thrashing on large data sets, resulting in huge performance penalties; this can in part be avoided with smarter paging algorithms, and with paging hardware caches. Large pages will waste (lots of) memory with small programs and data sets (again, see Tanenbaum's book), and there is no way to avoid this, as far as I know. VSPM is the ideal solution, at least in theory.
On Intel CPUs (from the 386 to their latest chips), pages have a fixed 4Kb size. For the Alpha processor, they are 8Kb. This is rather small when we consider the size of present-day programs and typical data sets, but it is a compromise dating back from the 386 era (probably a better compromise nowadays would be 64Kb).
Bruno Haible commented that a simple rule of thumb is to take the square root of the typical memory size, multiplied by the architecture's line cache in the processor. For early 386 systems, this would result in sqrt(4 bytes * 4Mb) = 4Kb. For present-day Pentiums, sqrt(16 bytes * 32Mb/64Mb) = 16/32Kb. The 6x86 L1 cache uses 32 byte lines. So my 64Kb guess would be slightly above the optimum page block size (note that if we had 64 Kb pages, we would only need a single 64K-entry page table to map the entire VM address space. This would simplify the 3-level Intel 4Kb page mechanism, and would save one memory access (~10 CPU clock cycles) on every uncached page access).
Bruno was unable to recall which OS design book presented this rule of thumb, so if anybody can explain the reasoning behind this, thanks.
Intel recently documented the 4 Mb Page Extensions, a feature of Pentium CPUs that implements 4Mb fixed-size pages. The Linux 2.0.29 kernel has a few lines concerning this 4Mb paging mechanism, but I don't know if it is fully implemented (I don't think so). 4Mb paging can be enabled by defining USE_PENTIUM_MM in /linux/include/asm/i386/pgtable.h, but I can't test this feature since I don't have a Pentium CPU handy. One can have both the 4Mb pages and the 4Kb pages enabled simultaneously.
Mike Jagdis has confirmed that 4Mb paging has been implemented in the 2.1.x kernels. It is automatically enabled if a specific Pentium CPU flag signals that 4Mb page extensions are supported (so it won't ever get enabled on Cyrix 6x86 CPUs).
Clearly, Cyrix has implemented a superior paging mechanism with the 6x86(L) CPUs, since VSPM allows page sizes in the range 4Kb to 4Gb (in powers of 2). Assuming you need a large, contiguous page, it is much simpler to setup a single large page with VSPM, than hundreds or thousands of small 4Kb pages, and the whole paging mechanism is much simpler to understand (however note that Cyrix made it a little complicated to setup a page using VSPM, because one has to handle special TR registers). The VSPM mechanism is effectively a static TLB cache, since the paging information is stored in registers inside the CPU. In a similar fashion to Intel's 4Mb paging extensions, VSPM can be used simultaneously together with traditional 4Kb paging.
Sadly, all references to VSPM were eliminated in the 6x86MX data book, and the relevant bits in the control registers were allocated to other uses, so don't look for an undocumented feature - the physical implementation is simply not there anymore.
I have applied the VSPM patch to the Linux 2.0.29 kernel. This patch sets up a single VSPM page, mapping the kernel at the same virtual memory address for all processes (i.e. all processes see the kernel at virtual address 0xc0000000). It does not make use of the 8 "cells" or "slots" available inside the 6x86 CPU. Mike has also determined that only 4 cells are effectively available, the upper 4 cells seem to shadow the 4 lower ones.
The kernel page is setup once and for all, and hence does not need to be resized or modified once Linux has booted and virtual memory has been setup.
Compilation time of a Linux 2.0.0 kernel on my 6x86L PR200+ box (2 x 75 MHz core clock, 256Kb L2 cache, 32Mb RAM, 4.3Gb DMA mode 2 IDE hard disk) with its default config file was measured with and without the VSPM patch, using time make zImage. Since kernel compilation is a job that involves both a large program (gcc) and a large data set (the linux kernel), it should provide some evidence of VSPM performance by generating many page faults. Here are the results:
Unconclusive, isn't it? :-(
We got a large number of page faults as expected, but their number did not decrease significantly with VSPM. Bruno explains this by the fact that VSPM is only used for mapping kernel memory, hence it would not affect the number of page faults generated by a user process (i.e., gcc kernel compilation). He is correct, of course: the above test in unconclusive by design :-( Back to the drawing board...
Is there any way to make a user process use VSPM? Not really. The VSPM mechanism doesn't seem reliable enough (although it works perfectly well for a single kernel page), and also it is not a general purpose mechanism (8 pages may not be enough for some processes). Also note that Mike's VSPM Linux implementation does not need to patch the code in /linux/include/asm/i386/pgtable.h, which is the assembly language header code for general paging, since VSPM is not used for user processes. VSPM is used once to setup the kernel page, and then is not touched anymore.
The basic reason why Cyrix decided to remove the VSPM feature on the 6x86MX and allocate the silicon to other purposes is that no OS ever implemented paging using VSPM (except Linux, and then only as a patch), and obviously Microsoft was not going to implement a feature that would not run on Intel CPUs. Besides, Mike described the VSPM on early 6x86's as a "minefield", because of its unpredictable behaviour.
IMHO, VSPM is an excellent idea but Cyrix did not do a perfect job at implementing it (too little, too late, and not enough documentation). Mike's patch works very well, but there is not much to gain from VSPM in terms of performance, since it is limited to mapping kernel memory.
Intel's 4Kb paging was adequate for the 386, but it is inadequate for present day CPUs, OS's, and large programs and data sets. The 4Mb fixed-size page extensions do not seem flexible enough to justify their implementation.
There is room for an alternative paging implementation in the x86 CPU market, as both Intel and Cyrix have demonstrated that such mechanisms can be made backward compatible. If such an alternative paging mechanism is ever implemented, it could be based on either fixed-size 64Kb pages or variable-size pages.
As seen from the above test, a medium-size compilation lasting around 400 seconds generated ~140.000 page faults. Most of these page faults were serviced by the kernel disk I/O buffers. If only 10% of these page faults had been serviced by fetching data from disk, the overhead would have been 10% * 140.000 * 20 ms = nearly 5 minutes!
The work-in-progress but already excellent book The Linux Kernel, by David A. Rusling. The draft version 0.1-13(19) is available on the LDP site in PostScript, Latex and DVI formats.
The Linux Kernel Hacker's Guide, also available from the LDP site, has two separate papers on Linux memory management. Both may be a bit outdated, but are recommended reading all the same.
An article by Robert Collins on the Pentium 4Mb Page Extensions, available from his site.
The On-Line Computing Dictionnary for short explanations of MMU and paging mechanisms.
The Linux kernel source, version 2.0.29.
The Linux kernel Cyrix patch by Mike Jagdis, available from the Linux Mama site (see Docs page).
Many thanks to Bruno Haible and Mike Jagdis for their comments and help.
* My favorite book was written by Andrew S. Tanenbaum: Structured Computer Organization, Prentice Hall. And even though Tanenbaum had a small flame war with Linus, I think he is one of the best authors on the subject of OS design and implementation. See the Linux reading-list mini-HOWTO for other books.
Last updated on October 15, 1997.
Copyright 1997 Andrew D. Balsa