COEN 180
Memory Cache Layouts
Overview       Direct Mapped Cache Design       Direct Mapped Cache Design with Blocks       Set Associative Caches       Fully Associative Caches


Main memory cache lie at the interface between fast SRAM and slower DRAM. The random access time differences are minute, therefore cache management needs to be done in hardware. For this reason, a datum stored in main memory will only be stored in one possible location in cache. Assume that the datum is located in memory at address A. We extract the least significant bits in order to determine the possible address of the item in the cache. Why the least significant bits? Because of spatial locality, data in main memory nearby should also be in cache. Their address differs in the least significant bits of the main memory address but not in the higher ones. Since a datum can be only placed in one position in the cache, we have to use the least significant bits to determine this location.

Direct Cache Design

Figure 1: Direct Mapped Cache

Figure 2: Address Split for Direct Mapped Cache

A simple cache design is that of a direct mapped cache (Figure 1). We divide the main memory address A of a datum into two portions. The high order bits are collected in the address tag, the low order bits in the index, which is the address of the cache line. The length of address, index, and tag are determined by the sizes of the components. For example, Figure 2 uses an address length of 27 bits. The memory size is therefore 227 = 128 M. But, what are we addressing? For historical purposes, and because the smallest used unit is the 8b character, memory is byte addressed. Thus, our main memory contains 128MB. However, access to memory is almost always in words. Indeed, the lowest two bits are not actually put on the address bus. Figure 2 lower half shows the resulting split of addresses: bits A1 and A0 specify the byte within the word (consisting of four bytes). Bits A11, ... , A2 specify the address in the cache, which can therefore store 210 words or 212 bytes for a total of 4KB. Address bits A27 to A12 form the 16 bit long tag.

When a word not present in the cache is requested by the processor, that word is moved into the cache and stored at the address in cache. The cache stores the word together with the tag. If the processor requests only a byte, the whole word is nevertheless moved to the cache and in parallel to the processor. The processor then extracts the requested byte out of the word. When a word is requested, the word is looked up first in the cache. The address of the request is split into the cache address and the tag. The look-up proceeds to this address in cache and compares the tag of the request with the tag stored in the cache. If they do not match, then the word in cache has a different address than the requested one; we have a cache miss. The requested word is fetched from main memory into the processor and a copy is stored in the cache. If during a look-up the tag portion of the address of the request and the tag stored at the cache location match, then the word is already in cache - we have a cache hit - and can be fetched from the cache.

If our cache design uses the write-through policy, then the implementation of writes is straight-forward. If the cache design uses write-back, then we need to store with the cache line a bit that tells us whether the word in the cache was modified. Assume a write-back policy. If we bring a word into cache, we first check whether the dirty bit at the cache is set. If yes, then we copy the word to main memory before we replace it with the new word. If no, then we just replace the contents of the cache line since the previous word has an exact copy in main memory. Figure 1 shows the dirty bit.

From the address split-up, we can calculate also the total size of the cache. Each cache line consists of 1 dirty bit, 16b tag, and 32b word, or of 47b. There are 210 or 1K cache lines, so that the cache needs to consists of 47K worth of SRAM. The overhead for this particular design is 17/32 = 53%. However, most caches are larger than the one in our example, so that the tags are smaller, leading to better overheads.

Direct Mapped Cache Design with Blocks

Figure 4: Direct Mapped Cache with Blocks

Figure 5: Address Splitting for a Direct Mapped Cache with Blocks

A direct mapped cache does not make good use of locality. Depending on the workload, access to a word predicts more or less well access to the following word. We can exploit this behavior by moving blocks of words into and out of the cache. In the example in Figures 4 and 5, a block consists of four words. Since blocks are contiguous, the least significant bits of the word address (bits 3 and 2 of the byte address) are used to address the word within the block. The next set of bits of the address form the address of the cache line where the block is stored in its totality. The highest order bits form the tags.

In the example, the cache has 211 lines, each containing a dirty bit, a tag of 13 bits, and four words of 32 bits each. The cache stores 215 bytes or 32KB. The overhead is now 14b/128b or 11%, not only a function of the smaller tag but also of the larger payload in the line. The existence of the dirty bit in the line shows that we are using write-back. If we write any of the words in the block, then we set the dirty bit and write all the words in the block back to main memory whenever we load a new block.

Set-Associative Cache Design

Figure 6: Set Associative Cache Design

Direct mapped caches place a block in only one possible position in the cache. This can lead to cache contention, where two (or more) popular items compete for the same position in the cache. The result of contention can be thrashing where an item is moved to cache replacing another, only to be replaced again by the first, and so on, leading to a high miss rate and high data transfer between cache and main memory.

Set associate cache (Figure 6) remedy contention, at least partially. A block is still only placed in a single cache line, determined by the index, but there are now 2, 4, 8, or even 16 slots for the block. Thus, if there is a contention for the same slot in cache under a directly mapped cache layout, now the two competing blocks can coexist in the same cache line.

In a set associative cache, the look-up is more complicated. The tag and the index is extracted from the requested address. The cache goes to the cache line determined by the index and compares in parallel the tags. If one of them fits, then the corresponding block contains the desired data. If none of the tags fits, then we have a miss. We now need to bring in the requested block into the cache. The question arises which of the blocks to replace. We can use a variety of strategies:

  1. Random: We randomly select a block to replace. This could be done by maintaining a modulo b (b the number of blocks per line) and let the counter determine the block to be replaced.
  2. LRU (Least Recently Used): We maintain in hardware a priority list that captures the data needed to implement least recently used. Since this means distirguishing 4! = 24 different orderings, we need a register of size at least 5b and some sophisticated hardware.
  3. LRU light: Instead of implementing full LRU, we implement an approximation. For example, we maintain the number of the two least frequently used blocks. When one of these blocks is touched, then we remove it from the list and randomly select another block into the list.

Fully Associative Caches

Associative memories are content addressable. Given the datum or part of the datum, we can retrieve the datum or the data that match the information. Fully addressable caches store a certain amount of data in the cache together with the full address. When given an address, then all the cache lines are searched for the address, and if found, then the associated datum is returned. Unfortunately, hardware constraints limit fully associative caches to small sizes.

Power Point Presentation on Caching

© 2003 Thomas Schwarz, S.J., COEN, SCU                                                                              SCU            COEN            T. Schwarz            COEN 180