Freja can identify three types of issues concerning software prefetch instructions; prefetch unnecessary, prefetch too distant and prefetch too close.
The information presented about prefetch issues differs a lot from that of other issues, and it uses none of the standard sections. The different types of prefetch advice only have one section in common:
A prefetch unnecessary issue is reported when a prefetch instruction that nearly always hits in the cache is found. The percentage of times this is the case is given by the fetch ratio of the instruction presented by Freja. Prefetch instructions that almost always hit in the cache consume execution resources without providing any benefit.
The prefetch unnecessary issue has the following sections:
Fetch ratio of the prefetch instruction
The fetch ratio of the prefetch instruction.
Instructions using the prefetched data before the prefetch
Lists the instructions that last touched the data the before the prefetch instruction. This is useful when trying to understand why the data already is in the cache, for example, to check if the data is brought into the cache by some part of the application that you did not expect.
A reasonable rule of thumb is that the fetch ratio of prefetch instructions should be above 10% for prefetches from the L2 cache or above 1% for prefetches from RAM. If the fetch ratio is too low the data that the prefetch instruction is prefetching is almost always already in the cache, so the prefetch instruction is not doing useful work and may instead decrease performance by consuming execution resources.
Check that the data is not brought into the cache by some part of the application that you did not expect. Make sure that you do not prefetch the same cache line multiple times. If that is not the case, consider removing the prefetch.
A prefetch too distant issue is reported when a prefetch instruction that fetches data that is not used before it is evicted from the cache again is found. The prefetch instruction may be placed too far ahead of the instructions that use the prefetched data, or the data may not be used as expected. Such a prefetch will consume execution resources and memory bandwidth without providing any benefit.
The prefetch too distant issue has the following sections:
Average fetch ratio of the instructions using the prefetched data
The fetch ratio of the instructions using the prefetched data. Tells you to what degree the prefetched data is evicted again before it is used.
Instructions using the prefetched data after the prefetch
Lists the instructions that use the prefetched data.
Consider reducing the distance between the prefetch and the instructions using the prefetched data, for example, if you are prefetching data a number of iterations ahead in a loop consider reducing that number of iterations.
This issue can also indicate that the data fetched by the prefetch instruction is actually never used as intended. Check that it is the intended instructions that use the data.
A prefetch too close issue is reported when a prefetch instruction that is too close to the instruction using the data is found. When prefetch instruction is too close the prefetched data does not have time to arrive from the next cache level or main memory before it is needed. The instruction using the data still stalls for some time and you do not get the full benefit of the prefetch.
The prefetch too close issue has the following sections:
Median number of memory accesses to the instructions using the prefetched data
The number of memory accesses between the prefetch instruction and the instructions using the prefetched data.
Instructions using the prefetched data after the prefetch
Lists the instructions that use the prefetched data.
Freja presents the distance as the median number of memory accesses before the next use of the prefetched data. It also presents which instructions are the next to touch the data. The required distance depends on the latency of the cache level or main memory the data is fetched from. A reasonable rule of thumb is that the distance should be at least 3 accesses for prefetches from the L2 cache or at least 30 accesses for prefetches from RAM.
Consider increasing the distance between the prefetch and the instructions using the prefetched data, for example, if you are prefetching data a number of iterations ahead in a loop consider increasing that number of iterations.