COEN 180
Memory Hierarchy
Contents: Storage Hierarchy; Caches;

J. von Neumann
We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.

A. W. Burks, H. H. Goldstine, J. von Neumann: Preliminary Discussion of the Logical Design of Electronic Computing Instrument, Part I, Vol. I, Report prepared for the U.S. Army Ord. Dept.
28 June 1946

Storage Hierarchy

There are three basic hardware modules (Bell, Newell: Computer Structures, 1971): Processors, Memory, and Communication. Memories store data and allow processors to read and to write the data. Given a memory address and some data, a memory write copies the data to that memory address, given a memory address, a memory read returns a copy of the data most recently written to that location.

Memory is evaluated according to the following (incomplete list of) criteria:

If we plot the size, speed, and costs of memory and storage devices, we obtain the following, schematic picture:

Faster devices cost more and have lower capacity. Currently the largest capacity storage devices are tapes, which still cost marginally less than hard drives. But tapes have access times in the minutes. The most expensive memory is in-CPU memory such as registers and internal instruction queues. While it is difficult to calculate the price, it is clearly more than tens of dollars per kilobyte. This is the fastest memory available (access time is essentially clock speed), but also one that cannot be provided in large quantities.

We can arrange the different types of memory in a pyramid. At the top is the fastest, smallest, and most expensive memory, the registers. At the bottom, the largest, cheapest, and slowest memory, off-line archival storage (e.g. a tape library). The upper levels of the memory hierarchy use electronic storage, the lower levels use block-addressed magnetic storage. It is customary to divide the memories in three different layers: primary memory (registers to main memory) characterized by volatility, secondary storage (hard drives) in on-line or near-line mode, and tertiary storage. Currently, we observe two phenomena: the continued emergence of specialty storage (optical RW, flash) and the vanishing of tertiary storage. (Next year or the year thereafter, a tape based backup will become more expensive than disk based backup over IP).

Exploiting the Memory Hierarchy: Caching

We exploit the memory hierarchy by placing data that is frequently accessed into the upper tiers of the memory hierarchy and we use the lower tiers to store data that is only infrequently used. In addition, we use the lower tiers of the memory hierarchy to store data nonvolatily. We can choose to assign data to the upper tiers of the hierarchy explicitly. This technique is fraud with difficulties: How do we avoid to overflow the fast portions of the hierarchy? How do we adjudicate between competing data? How do we afford to make so many decisions? Instead, we can use the ubiquitous technique of caching. A cache is a faster member of the storage hierarchy built on top of a slower one. The cache contains copies of the data in the slower memory. A read request will be send first to the cache. If the data is found, then it is returned to the requestor. Otherwise, the request is serviced by the slower memory. A write request is fulfilled by updating the copy in the cache and possibly updating the copy in the slower memory as well.

The proportion of read accesses that can be fulfilled by the cache is called the hit rate h. The opposite proportion of read accesses that cannot be fulfilled by the cache and hence are fulfilled from slower memory is called the miss rate m. By definition h + m = 1. Call tC the time it takes to satisfy a read request from cache. Call tM the time it takes to satisfy a read request from slower memory. The access time for the (cache + slower memory) system is then:

tSystem = h·tC + m·tM.

If the hit rate is high, then the system read times are close to the cache read times. Of course, the capacity of the system is the capacity of the slower memory. (The additional capacity of the cache is not used to store more data.) Caching allows storage at the speed of the faster device for a capacity of the slower device if the hit rate can be kept high.

High hit rates are based on the phenomenon of locality. Spatial locality is the tendency of data to be accessed if data nearby has been accessed before. Temporal locality is the tendency of data to be accessed if they have been accessed before. Both types of locality are observable phenomena in many workloads. For example, execution of a program typically shows both types of locality strongly. Spatial locality results from the next instruction being executed next unless the current instruction is an unconditional loop (e.g. a return statement) or a conditional loop that is taken. Most programs spend most of their time executing loops. The instructions in the loop are executed over and over thus giving rise to temporal locality. On the other hand, database tables tend to have less locality. Spatial locality is hardly ever observed because records are accessed in a scattered mode. An exception would be a scan of the table that proceeds according to the storage location, e.g. because the table is organized as a B+ tree. Temporal locality is more frequently observed in a database, since some records become hot (i.e. are accessed frequently) until they cool down again.

Locality allows us to use an automatic rule to populate a cache. A data item enters a cache if it or a close neighbor is being accessed. Eventually, a cache will be full and the problem arises which data item to replace. (Replacement policy). Finding good replacement algorithm is an ongoing research work, despite more than fourty years of work. The complexity of the algorithm depends very much on the speed of the cache and the slower memory. For example, the cache between the processor and main memory cannot have a complicated replacement algorithm because it needs to be implemented in hardware. On the other hand, the cache between main memory and disk can use quite complicated software algorithm, since the several milliseconds that it takes the fastest disk to access data justify spending a large number of processor cycles to increase the hit rate.

Memory / storage systems also need to handle writes. A newly written piece of data is usually placed in the cache under the assumption that it is more likely to be read (or overwritten) soon. There are two main strategies to maintain the consistency between the cache and the slower memory: write-back, where the data is only written to the cache and then written to slower memory when the data is being replaced in cache, and write-through, which modifies the copies both in cache and in memory. Write-back is often implemented by maintaining a dirty bit with the change datum in cache. When the datum is changed in cache, the dirty bit is set. When a datum is replaced in the cache, then the dirty bit is used to decide whether the datum in cache can just be overwritten or if the datum needs to be written to slower memory.

 

© 2003 Thomas Schwarz, S.J., COEN, SCU                                                                              SCU            COEN            T. Schwarz            COEN 180