Garbage Collection

From SystemsResearch

Jump to: navigation, search

Automatic storage management is an essential property of high level programming systems, providing an error-free abstraction with which the programmer may manipulate space without regard to the inessential details of physical storage. The abstraction holds only until the available storage becomes full. Thus it is important for the storage management system to distinguish useful data from garbage, so that the space occupied by the garbage may be reused. Automatic storage managers typically equate "useful" with "reachable", via some chain of references from a root. The technique used to identify the unreachable space automatically is called garbage collection. This research is concerned with garbage collection in persistent systems and in distributed systems where each site in the system has its own local storage and may communicate with other sites only by passing messages.

Two new garbage collection algorithms for persistent stores and distributed object systems, called respectively PMOS (Persistent Mature Object Space) and DMOS (Distributed Mature Object Space) have been developed. They both derive from the MOS (Mature Object Space) collector, sometimes known as the Train Algorithm. MOS is a main memory collector designed to do a limited amount of work each time it is invoked (so it is non-disruptive) and to guarantee that each unreachable data object is collected eventually (and so it is complete). PMOS extends MOS to provide incrementality in a persistent context, while also limiting I/O overhead. DMOS builds upon MOS and PMOS to offer incremental collection for distributed (non-persistent) systems.

The contribution of PMOS is to provide an incremental garbage collector for a persistent system which is complete, independent of the recovery mechanism and which limits the I/O overhead.

The contribution of DMOS is its unique combination of desirable properties for a distributed collector that avoids global tracing; specifically, DMOS is:

  • Safe: it does not collect live (reachable) objects.
  • Complete: it reclaims all garbage, including cyclic garbage that spans sites, within a finite number of invocations.
  • Non-disruptive: it bounds the amount of collection work, thereby bounding the time and space requirements, for each invocation.
  • Incremental: it reclaims space incrementally without global knowledge of reachability.
  • Local: it initiates local collections at each site independently of other sites.
  • Independent: it is independent of the specific local collection algorithm employed at each site, though it imposes some requirements on the local collectors.
  • Decentralised: it uses no algorithms that rely on a single central site for processing or global synchronisation.
  • Asynchronous: it communicates via asynchronous messages, and the collector at a site need only synchronise with other sites in a few cases; application computation never need wait for such synchronisation.

From the above it can be seen that DMOS implementations have the prerequisites for scalablity of: incrementality, locality, decentralisation, and asynchrony.

The significance of the DMOS properties is that they are difficult to achieve in combination; in particular, it is difficult simultaneously to realise completeness, incrementality, and decentralisation. Previous collectors have achieved only two of the three at a time, and it is something of a folk theorem that completeness requires global tracing or global termination.

Collaborators: Eliot Moss: University of Massachusetts, Rick Hudson: Intel Corporation

Further Information

Personal tools