Very complex code may not lend itself to transforming, or the code may actually be optimal but still suffer from cache misses. Remaining misses can then be reduced by adding software prefetch operations.
Once software prefetch instructions are added to the program, Freja will evaluate their efficiency.
To be efficient, the prefetch instruction needs to fulfill three conditions:
The GNU compiler suite has built-in support for software prefetch instructions (__builtin_prefetch). You then also need to tell the compiler to generate code for a processor model that supports prefetch instructions.
Other compilers may have their own intrinsic functions, or the programmer may need to resort to writing inline assembly code.
Cache misses occurring close in time can often be overlapped on x86 architectures (so-called memory-level parallelism, MLP). An alternative to prefetching is therefore to try to move cache-miss accesses closer together (and hoping that the compiler will keep it that way...).