Analogous to optimizing data layout for efficient cache usage and bandwidth usage, for multithreaded applications it is important to arrange data for efficient inter-cache communication.
Whenever one thread uses data that a different thread has written, there is some communication occurring between the caches. This involves synchronizing the caches' contents, and maintaining a notion of the current owner. Just like the memory communication, synchronization and ownership is managed for cache-line chunks.
If the consuming thread is not using every byte in the communicated cache line, then this is wasteful. It would be better to reorganize data to fill cache lines fully before letting the consumer start reading data.
Tip | |
---|---|
Add complete cache lines' worth of data to shared memory buffers before letting the consumer thread start reading data |
Tip | |
---|---|
When performing matrix calculations, align the calculation frontier along cache lines instead of across cache lines. |