The Swap: A UNIX legacy

The concept of virtual memory was big leap in operating systems. It provided programmers and system hackers (who hack above the kernel) a power which brought elegance and simplicity into a lot of applications in userspace. With virtual memory a process could be given a complete address space without worrying about how the memory is being used by other processes. This allowed a lot of simplification in code text, heap and stack allocation to programs which made linkers and loaders simpler and the runtime more elegant. Virtual Memory allowed processes to be able to memory map not just RAM but also files and devices which allowed runtime optimization of the amount of physical RAM used by programs. For eg. processes which run the same program share the RAM segments of the code text. Copy-on-write with virtual memory made it into a very simple and elegant system for managing the memory with mixed shared and private content between processes. However, I think that virtual memory can be used more powerfully than its being presently used.

The Disjoint between Swap and Storage

Let me start with a problem with the UNIX swap system which would expose some of the issues with virtual memory. Swap on UNIX as we all know is some space in the secondary storage (typically the disk), which the OS uses to temprarily store the memory contents of processes when the OS is short on memory. This allows more programs than what could fit the physical RAM to run simultaneously as the RAM and the Swap together is the amount of ‘memory’ that the OS has with it. Consider say, a text editor process in which the user is editing text. In the text editor process, one will typically find the following chunks of memory.

  • The text editor program’s code (called the text region)
  • The heap, which possibly contains the entire file data in memory
  • The stack

Of the above, the program text has large parts of its already shared with other processes because of use of shared libraries. The stack and the heap contain data representing the current state of the process. This cannot be shared (or picked directly from the disk as with shared libraries or executables) as the contents depend on the runtime state of the program.

Almost all text editors support some kind of autosave feature which makes sure that even if the user doesn’t save the text he is editing regularly, it still gets saved. This is useful if for some reason there is a crash. However, almost all filesystems come in its way by making sure that they buffer the data prior to saving so that performance is maximized. Hence, even though the file was ‘autosaved’ it was actually not commited to disk. In case of an OS crash the user has to deal with not only potential loss of data since the last autosave was done, but also the loss of data before the autosave as the filesystem might have not synced itself with the disk. Also, the disk data which is buffered is buffered in a separate part of RAM. What we basically see here is wastage of RAM and also not achieving either filesystem performance (as the text editor tries to autosave much more frequently) or robust autosaving, as filesyetm buffering comes into its way. With this existing confusion between software components lets introduce the swap. Consider that in the middle of the editing session, the user just left the program (without saving ofcourse) and went to play a game. This game swaps out almost all other processes into the swap (including the text editor) because it itself requires a large amount of RAM. Note that the editor’s stack and heap and banished to the swap which means that the unsaved data in the text editor is now on the swap. As the game runs, the editor comes alive in the middle of the game to autosave the unsaved data. It fetches this data from the swap, and then writes it to the disk. The filesystem buffers it and finally puts it to disk. Thus, not only are two chunks of precious RAM used for the editor’s data, the swap is used to temprorarily store something on the disk which could have been permanently stored on the disk right away.

The example above gives us some insight into the disjoint between swap and storage that we have on today’s systems. It would have been so much better if before swapping out the text editor process, the OS would have somehow autosaved it. A quick and dirty approach would have been that the OS somehow just notified the editor that it was about to be swapped. The editor would then save the file and release the memory used by the data (and would restore once it is swapped in). However, this method has too many issues. First of all, in current systems, there is no such concept as a process being swapped out. Some pages in memory are swapped out which are probably not being used. The process continues to run. Secondly, it will be hard for the OS to figure out how much time to give to the process to flush its data to the hard drive. If the OS uses a function hook from the process and executes it synchronously, then there is a chance that the application hook never returns, effectively bringing the OS to a halt. A ‘timeout’ based mechanism seems to be rather clumsy.

Marrying Data Model with the VM

A good solution to the problem can be the following. In the same spirit as “memory mapping” portions of files to memory, the VM is extended to support a data model. Let us call this extention XVM. Thus, unlike memory management functions like malloc() and free(), there would be separate allocation and deallocation functions for XVM. These, besides taking the data type and size arguments, will also take a name in the permanent store argument. The XVM would know exactly how to translate between the memory and disk representations of the data types and would use RAM only as a cache for the content of the data type. Thus, while programming, the moment the programmer has to deal with data which is supposed to be in permanent storage, he uses the XVM APIs to allocate memory for them. The OS takes care of associating the name in the permanent storage to the data content and translating between the representations. This allows the OS to flush these parts of memory to disk when it needs physical RAM. Also, the XVM notifies the program whenever it flushes the data to the disk so that the program knows. Of course the program itself has the power to flush the data to disk. However, the OS reserves the right to take this as only a hint.

There is one catch though in the above solution. The code which translates between the memory and disk representations has to be part of the OS. This is because the OS requires a guarantee on whether the code really terminates in a short amount of time. For this it will have to use its own code to do the translation. This means that there will not be application specific variety in the data model. It should be noted that this is not a very big problem as a database model for persistent storage would be sufficient for all applications. For applications with very specific needs, it is always welcome to work with the XVM backend of a simple memory mapped style allocated space for its permanent store. This would be the application’s persistent store format. The application would itself keep this backend uptodate depending on its actual data model semantics. The key to the success of this scheme would be an efficient way of interacting with the persistent store format in the language of the in memory data model.

This entry was posted in Computing. Bookmark the permalink.

2 Responses to The Swap: A UNIX legacy

  1. Pingback: On Symantic Storage « Defective Compass

  2. hajar says:

    trés bn explication mé po détailliez

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s