• Miss penalty—The time required to process a miss, which includes replacing a
block in an upper level of memory, plus the additional time to deliver the
requested data to the processor. (The time to process a miss is typically significantly
larger than the time to process a hit.)
The memory hierarchy is illustrated in Figure 6.1. This is drawn as a pyramid to
help indicate the relative sizes of these various memories. Memories closer to the
top tend to be smaller in size. However, these smaller memories have better performance
and thus a higher cost (per bit) than memories found lower in the pyramid.
The numbers given to the left of the pyramid indicate typical access times.
For any given data, the processor sends its request to the fastest, smallest partition
of memory (typically cache, because registers tend to be more special purpose).
If the data is found in cache, it can be loaded quickly into the CPU. If it is
not resident in cache, the request is forwarded to the next lower level of the hierarchy,
and this search process begins again. If the data is found at this level, the
whole block in which the data resides is transferred into cache. If the data is not
found at this level, the request is forwarded to the next lower level, and so on.
The key idea is that when the lower (slower, larger, and cheaper) levels of the
hierarchy respond to a request from higher levels for the content of location X,
they also send, at the same time, the data located at addresses X + 1, X + 2, . . . ,
thus returning an entire block of data to the higher-level memory. The hope is that
this extra data will be referenced in the near future, which, in most cases, it is.
The memory hierarchy is functional because programs tend to exhibit a property
known as locality, which often allows the processor to access the data returned
for addresses X + 1, X + 2, and so on. Thus, although there is one miss to, say