Homework 3 Answers

Coen 1
Fall 2000

Homework 3 - Answers

What is the advantage of having a family of physically different CPUs that all execute the same instructions?

The advantage is that a computer owner can move the same binary machine language programs from a slower CPU to a new, faster CPU without having to rewrite all the programs. This saves a lot of money and time. For the computer manufacturer, having the family of CPUs makes it much easier to sell a replacement, faster CPU to a customer because there is no need to reinvest in program development.

Explain why the performance improvement possible through pipelining is proportional to the number of stages in the pipeline.

The number of stages in the pipeline determines the number of instructions that can be in some phase of execution simultaneously. Two stages, two instructions; four stages, four instructions. If we have N stages, we have N instructions executing at the same time, and the program finishes in 1/N of the time. There is some small amount of time required to fill the pipeline in the first place, and maximum speedup is only achieved if the pipeline stays full all the time.

What characteristics about a program (the instructions the CPU is executing) can reduce the performance of a superscalar CPU?

A superscalar CPU only gains performance if it can issue more than one instruction simultaneously. If a following instruction depends on the result of a preceding instruction, the second can't begin execution until the first is finished, so that diminishes superscalar performance (same for pipelined CPUs, as well). Also, superscalar execution units are often restricted to either integer or floating point computation. If there is no instruction ready to be issued of a particular type, that execution unit stays idle. So long sequences of integer instructions would leave the floating point pipeline idle.

What performance problem is a channel addressing?

A channel (or, more realistically, channels) is a mechanism to allow input/output operations (transfers between memory and peripheral devices like disks, tapes, printers, displays) to occur simultaneously with meaningful execution of instructions by the CPU. The channel frees the CPU from controlling I/O transfers, which was a major problem in early computer designs.

Explain the difference between full associative and direct mapped cache mapping approaches.

In a full associative cache mapping, each block in main memory can be placed anywhere in the cache. Suppose the cache block sizes are 2ⁿ, for some value n (typically 4 to 6). The address of a memory reference has the low order n bits removed and the rest of the address represents the tag field. The tag fields of the blocks in the cache are searched in parallel to see if the corresponding physical memory block is in the cache.

For a direct mapped cache mapping, each block in main memory can only go into one block in the cache. Again, the low order n bits of the address are removed. Some initial part of the remaining address bits represents the tag (the one cache block where this main memory block would be stored). The rest of the remaining address bits are then compared to the corresponding part of the address of the block in the one possible cache location to determine if the correct block is already in the cache.

Full associative mapping has much less potential for collisions between blocks trying to occupy the cache. That is, two or more main memory blocks may have to fit into the same cache block with direct mapping, but could go into different cache blocks with a full (or set) associative mapping. But it also requires more hardware to determine which block in the cache to use and to search all the cache blocks in parallel to determine whether a memory access yields a hit.

Explain the components of disk access time.

There are two components of disk access time: the seek time, which is the time it takes to move the head from the track it is currently over to the track wanted (this time varies depending on how many tracks that difference is), and the latency time, which is how long it takes for the desired sector to rotate under the head (this time varies from 0 to a full revolution, but on average is half a revolution).