3.4. Cache Misses

3.4. Cache Misses
	Chapter 3. Introduction to Caches

When a program accesses a memory location that is not in the cache, it is called a cache miss. Since the processor then has to wait for the data to be fetched from the next cache level or from main memory before it can continue to execute, cache misses directly influence the performance of the application.

It is hard to tell from just the number of misses if cache misses are causing performance problems in an application. The same number of misses will cause a much greater relative slowdown in a short-running application than in a long-running one.

A more useful metric is the cache miss ratio, that is, the ratio of memory accesses that cause a cache miss. From the miss ratio you can usually tell whether cache misses may be a performance problem in an application.

The cache miss ratio of an application depends on the size of the cache. A larger cache can hold more cache lines and is therefore expected to get fewer misses.

The performance impact of a cache miss depends on the latency of fetching the data from the next cache level or main memory. For example, assume that you have a processor with two cache levels. A miss in the L1 cache then causes data to be fetched from the L2 cache which has a relatively low latency, so a quite high L1 miss ratio can be acceptable. A miss in the L2 cache on the other hand will cause a long stall while fetching data from main memory, so only a much lower L2 miss ratio is acceptable.

A special case is cache misses caused by prefetch instructions, see Section 3.6.1, “Software Prefetching”. Unlike other cache misses these do not cause any stalls, but will instead trigger a fetch of the requested data so that later accesses will not experience a cache miss. In fact, a prefetch instruction should ideally have a high miss ratio, since that means the prefetch instruction is doing useful work.

Freja therefore does not include misses caused by prefetch instructions when calculating statistics for instruction groups, loops or the entire application.


3.3. Replacement Policies		3.5. Data Locality